Python自動(dòng)化讀取txt文件數(shù)據(jù)的8個(gè)實(shí)用腳本
這次和大家分享txt辦公自動(dòng)化,包括讀取、對(duì)比、過(guò)濾、合并、轉(zhuǎn)換格式、提取數(shù)據(jù)、統(tǒng)計(jì)詞頻、生成報(bào)告等。
準(zhǔn)備工作:安裝所需的Python庫(kù)
re(正則表達(dá)式操作,用于復(fù)雜文本匹配)csv(處理CSV文件)json(處理JSON文件)collections(用于統(tǒng)計(jì)詞頻)matplotlib和wordcloud(生成詞云圖)
1.讀取txt內(nèi)容
1.1 逐行讀取txt文件
在數(shù)據(jù)處理的第一步就是讀取txt文件。以下是逐行讀取txt文件的示例代碼:
def read_txt_file_by_line(filepath):
with open(filepath, 'r', encoding='utf-8') as file:
for line in file:
print(line.strip())
# 示例調(diào)用
read_txt_file_by_line('example.txt')
1.2 讀入整個(gè)txt文件內(nèi)容
如果需要將整個(gè)txt文件的內(nèi)容讀入到一個(gè)字符串中,可以使用以下代碼:
def read_txt_file(filepath):
with open(filepath, 'r', encoding='utf-8') as file:
content = file.read()
return content
# 示例調(diào)用
content = read_txt_file('example.txt')
print(content)
2. 對(duì)比兩個(gè)txt文件內(nèi)容
2.1 基本文本對(duì)比
有時(shí)候我們需要比較兩個(gè)txt文件內(nèi)容是否相同,以下代碼可以實(shí)現(xiàn)這一功能:
def compare_txt_files(file1, file2):
with open(file1, 'r', encoding='utf-8') as f1, open(file2, 'r', encoding='utf-8') as f2:
content1 = f1.readlines()
content2 = f2.readlines()
for line1, line2 in zip(content1, content2):
if line1 != line2:
print(f'Difference found:\nFile1: {line1}\nFile2: {line2}')
# 示例調(diào)用
compare_txt_files('file1.txt', 'file2.txt')
2.2 差異高亮顯示
為了更直觀地顯示txt文件之間的差異,可以用差異高亮顯示的方法。我們使用difflib庫(kù)來(lái)實(shí)現(xiàn):
import difflib
def highlight_differences(file1, file2):
with open(file1, 'r', encoding='utf-8') as f1, open(file2, 'r', encoding='utf-8') as f2:
content1 = f1.readlines()
content2 = f2.readlines()
diff = difflib.unified_diff(content1, content2, fromfile='file1', tofile='file2')
for line in diff:
print(line)
# 示例調(diào)用
highlight_differences('file1.txt', 'file2.txt')
3. txt文件內(nèi)容過(guò)濾
3.1 過(guò)濾特定關(guān)鍵字行
在處理txt文件時(shí),可能需要過(guò)濾掉包含特定關(guān)鍵字的行。以下是一個(gè)示例代碼:
def filter_lines_by_keyword(filepath, keyword):
with open(filepath, 'r', encoding='utf-8') as file:
lines = file.readlines()
filtered_lines = [line for line in lines if keyword not in line]
return filtered_lines
# 示例調(diào)用
filtered = filter_lines_by_keyword('example.txt', 'filter_keyword')
for line in filtered:
print(line.strip())
3.2 過(guò)濾空行和注釋行
有時(shí)候需要過(guò)濾掉空行和注釋行(比如以#開(kāi)頭的行)。以下是實(shí)現(xiàn)這一功能的代碼:
def filter_empty_and_comment_lines(filepath):
with open(filepath, 'r', encoding='utf-8') as file:
lines = file.readlines()
filtered_lines = [line for line in lines if line.strip() and not line.strip().startswith('#')]
return filtered_lines
# 示例調(diào)用
filtered = filter_empty_and_comment_lines('example.txt')
for line in filtered:
print(line.strip())
4. 合并多個(gè)txt文件
4.1 簡(jiǎn)單合并
將多個(gè)txt文件的內(nèi)容簡(jiǎn)單合并成一個(gè)文件,可以使用以下代碼:
def merge_txt_files(file_list, output_file):
with open(output_file, 'w', encoding='utf-8') as outfile:
for file in file_list:
with open(file, 'r', encoding='utf-8') as infile:
outfile.write(infile.read())
outfile.write('\n')
# 示例調(diào)用
merge_txt_files(['file1.txt', 'file2.txt', 'file3.txt'], 'merged.txt')
4.2 按行混合合并
如果需要按行混合合并多個(gè)文件的內(nèi)容,可以使用以下代碼:
def merge_files_by_line(file_list, output_file):
files = [open(file, 'r', encoding='utf-8') for file in file_list]
with open(output_file, 'w', encoding='utf-8') as outfile:
while True:
lines = [file.readline() for file in files]
if all(line == '' for line in lines):
break
for line in lines:
if line:
outfile.write(line.strip() + '\n')
for file in files:
file.close()
# 示例調(diào)用
merge_files_by_line(['file1.txt', 'file2.txt', 'file3.txt'], 'merged_by_line.txt')
5. 將txt文件轉(zhuǎn)換為其他格式
5.1 轉(zhuǎn)換為csv格式
有時(shí)候我們需要將txt文件的內(nèi)容轉(zhuǎn)換成csv格式以便進(jìn)行數(shù)據(jù)處理或分析,下面是相關(guān)代碼示例:
import csv
def txt_to_csv(txt_file, csv_file):
with open(txt_file, 'r', encoding='utf-8') as infile, open(csv_file, 'w', newline='', encoding='utf-8')
as outfile:
writer = csv.writer(outfile)
for line in infile:
writer.writerow(line.strip().split())
# 示例調(diào)用
txt_to_csv('example.txt', 'output.csv')
這段代碼將txt文件的內(nèi)容逐行讀取,并按空格或制表符拆分成csv格式。
5.2 轉(zhuǎn)換為json格式
除了csv格式,JSON格式也是常用的數(shù)據(jù)存儲(chǔ)格式。以下是將txt文件轉(zhuǎn)換為JSON格式的代碼示例:
import json
def txt_to_json(txt_file, json_file):
data = []
with open(txt_file, 'r', encoding='utf-8') as infile:
for line in infile:
data.append(line.strip())
with open(json_file, 'w', encoding='utf-8') as outfile:
json.dump(data, outfile, indent=4)
# 示例調(diào)用
txt_to_json('example.txt', 'output.json')
這段代碼將txt文件的每一行內(nèi)容作為JSON數(shù)組里的一個(gè)元素進(jìn)行存儲(chǔ)。
6. 從txt文件提取數(shù)據(jù)
6.1 提取特定模式的文本
有時(shí)候我們需要從txt文件中提取符合特定模式的文本,可以使用正則表達(dá)式(re庫(kù))來(lái)實(shí)現(xiàn)。以下代碼示例演示如何提取符合某個(gè)模式的文本:
import re
def extract_pattern_from_txt(pattern, txt_file):
matches = []
with open(txt_file, 'r', encoding='utf-8') as file:
content = file.read()
matches = re.findall(pattern, content)
return matches
# 示例調(diào)用,提取所有的數(shù)字
pattern = r'\d+'
matches = extract_pattern_from_txt(pattern, 'example.txt')
print("Match found:", matches)
6.2 提取郵件地址或URL
我們可以使用類(lèi)似的方法來(lái)提取郵件地址或URL:
def extract_emails_and_urls(txt_file):
email_pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
url_pattern = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
with open(txt_file, 'r', encoding='utf-8') as file:
content = file.read()
emails = re.findall(email_pattern, content)
urls = re.findall(url_pattern, content)
return emails, urls
# 示例調(diào)用
emails, urls = extract_emails_and_urls('example.txt')
print("Emails found:", emails)
print("URLs found:", urls)
7. 統(tǒng)計(jì)txt文件中的詞頻
7.1 統(tǒng)計(jì)單詞出現(xiàn)次數(shù)
我們可以統(tǒng)計(jì)txt文件中單詞的出現(xiàn)頻次,并對(duì)其進(jìn)行排序。以下代碼示例展示如何實(shí)現(xiàn):
from collections import Counter
def count_word_frequency(txt_file):
with open(txt_file, 'r', encoding='utf-8') as file:
words = file.read().split()
word_freq = Counter(words)
return word_freq
# 示例調(diào)用
word_freq = count_word_frequency('example.txt')
for word, freq in word_freq.most_common():
print(f'{word}: {freq}')
7.2 生成詞云圖
對(duì)于可視化效果,可以生成詞云圖來(lái)顯示詞頻分布:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
def generate_word_cloud(txt_file):
with open(txt_file, 'r', encoding='utf-8') as file:
text = file.read()
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
# 示例調(diào)用
generate_word_cloud('example.txt')
8. 自動(dòng)生成txt報(bào)告
8.1 從模板生成報(bào)告
可以使用txt模板生成報(bào)告,將動(dòng)態(tài)數(shù)據(jù)填充到模板中。以下示例展示如何從模板生成報(bào)告:
def generate_report_from_template(template_file, output_file, data):
with open(template_file, 'r', encoding='utf-8') as infile, open(output_file, 'w', encoding='utf-8')
as outfile:
content = infile.read()
for key, value in data.items():
content = content.replace(f'{{{{ {key} }}}}', str(value))
outfile.write(content)
# 示例調(diào)用
data = {
'name': 'Alice',
'date': '2024-08-17',
'summary': 'This is a summary of the report.'}
generate_report_from_template('template.txt', 'report.txt', data)
8.2 動(dòng)態(tài)生成報(bào)告內(nèi)容
有時(shí)候需要?jiǎng)討B(tài)生成報(bào)告的內(nèi)容,以下示例展示如何實(shí)現(xiàn):
def generate_dynamic_report(output_file, sections):
with open(output_file, 'w', encoding='utf-8') as outfile:
for section in sections:
outfile.write(f'# {section["title"]}\n\n')
outfile.write(f'{section["content"]}\n\n')
# 示例調(diào)用
sections = [{"title": "Introduction",
"content": "This is the introduction section of the report."},
{"title": "Data Analysis",
"content": "This section contains the analysis of the data."}]
generate_dynamic_report('dynamic_report.txt', sections)
9. 最后
通過(guò)這篇文章,你已經(jīng)了解了使用Python進(jìn)行txt文件的多種辦公自動(dòng)化方法,包括讀取、對(duì)比、過(guò)濾、合并、轉(zhuǎn)換格式、提取數(shù)據(jù)、統(tǒng)計(jì)詞頻、生成報(bào)告等。這些技巧不僅能提高效率,還能為數(shù)據(jù)分析工作打下堅(jiān)實(shí)的基礎(chǔ)。
到此這篇關(guān)于Python自動(dòng)化讀取txt文件數(shù)據(jù)的8個(gè)實(shí)用腳本的文章就介紹到這了,更多相關(guān)Python讀取txt文件數(shù)據(jù)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
- Python讀取txt文件數(shù)據(jù)的方法(用于接口自動(dòng)化參數(shù)化數(shù)據(jù))
- python讀取txt數(shù)據(jù)的操作步驟
- 教你利用python如何讀取txt中的數(shù)據(jù)
- Python實(shí)現(xiàn)讀取txt文件中的數(shù)據(jù)并繪制出圖形操作示例
- python讀取txt文件并取其某一列數(shù)據(jù)的示例
- python讀取csv和txt數(shù)據(jù)轉(zhuǎn)換成向量的實(shí)例
- python使用numpy讀取、保存txt數(shù)據(jù)的實(shí)例
- 對(duì)python .txt文件讀取及數(shù)據(jù)處理方法總結(jié)
- Python實(shí)現(xiàn)讀取TXT文件數(shù)據(jù)并存進(jìn)內(nèi)置數(shù)據(jù)庫(kù)SQLite3的方法
相關(guān)文章
Python3 pickle對(duì)象串行化代碼實(shí)例解析
這篇文章主要介紹了Python3 pickle對(duì)象串行化代碼實(shí)例解析,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2020-03-03
在Python中如何傳遞任意數(shù)量的實(shí)參的示例代碼
這篇文章主要介紹了在Python中如何傳遞任意數(shù)量的實(shí)參的示例代碼,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧2019-03-03
python爬蟲(chóng)中g(shù)et和post方法介紹以及cookie作用
本篇文章通過(guò)爬取163郵箱實(shí)例介紹了python爬蟲(chóng)中g(shù)et和post方法介紹以及cookie作用,對(duì)此有興趣的朋友學(xué)習(xí)下。2018-02-02
Tornado 多進(jìn)程實(shí)現(xiàn)分析詳解
這篇文章主要介紹了Tornado 多進(jìn)程實(shí)現(xiàn)分析詳解,具有一定借鑒價(jià)值,需要的朋友可以參考下2018-01-01
Python畫(huà)圖小案例之小雪人超詳細(xì)源碼注釋
在看了很多Python教程之后,覺(jué)得是時(shí)候做點(diǎn)什么小項(xiàng)目來(lái)練練手了,于是想來(lái)想去,用python寫(xiě)了一個(gè)小雪人,代碼注釋無(wú)比詳細(xì)清楚,快來(lái)看看吧2021-09-09
Python3 queue隊(duì)列模塊詳細(xì)介紹
queue是python中的標(biāo)準(zhǔn)庫(kù),俗稱(chēng)隊(duì)列。這篇文章給大家介紹了Python3 queue隊(duì)列模塊,包括模塊中的常用方法及構(gòu)造函數(shù),需要的朋友參考下吧2018-01-01
Python實(shí)現(xiàn)的數(shù)據(jù)結(jié)構(gòu)與算法之快速排序詳解
這篇文章主要介紹了Python實(shí)現(xiàn)的數(shù)據(jù)結(jié)構(gòu)與算法之快速排序,詳細(xì)分析了快速排序的原理與Python實(shí)現(xiàn)技巧,需要的朋友可以參考下2015-04-04

