Python統(tǒng)計純文本文件中英文單詞出現(xiàn)個數(shù)的方法總結(jié)【測試可用】
本文實例講述了Python統(tǒng)計純文本文件中英文單詞出現(xiàn)個數(shù)的方法。分享給大家供大家參考,具體如下:
第一版: 效率低
# -*- coding:utf-8 -*-
#!python3
path = 'test.txt'
with open(path,encoding='utf-8',newline='') as f:
word = []
words_dict= {}
for letter in f.read():
if letter.isalnum():
word.append(letter)
elif letter.isspace(): #空白字符 空格 \t \n
if word:
word = ''.join(word).lower() #轉(zhuǎn)小寫
if word not in words_dict:
words_dict[word] = 1
else:
words_dict[word] += 1
word = []
#處理最后一個單詞
if word:
word = ''.join(word).lower() # 轉(zhuǎn)小寫
if word not in words_dict:
words_dict[word] = 1
else:
words_dict[word] += 1
word = []
for k,v in words_dict.items():
print(k,v)
運行結(jié)果:
we 4
are 1
busy 1
all 1
day 1
like 1
swarms 1
of 6
flies 1
without 1
souls 1
noisy 1
restless 1
unable 1
to 1
hear 1
the 7
voices 1
soul 1
as 1
time 1
goes 1
by 1
childhood 1
away 2
grew 1
up 1
years 1
a 1
lot 1
memories 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence 1
regardless 1
shackles 1
mind 1
indulge 1
in 1
world 1
buckish 1
focus 1
on 1
beneficial 1
principle 1
lost 1
themselves 1
第二版:
缺點:遇到大文件要一次讀入內(nèi)存,性能不好
# -*- coding:utf-8 -*-
#!python3
import re
path = 'test.txt'
with open(path,'r',encoding='utf-8') as f:
data = f.read()
word_reg = re.compile(r'\w+')
#word_reg = re.compile(r'\w+\b')
word_list = word_reg.findall(data)
word_list = [word.lower() for word in word_list] #轉(zhuǎn)小寫
word_set = set(word_list) #避免重復(fù)查詢
# words_dict = {}
# for word in word_set:
# words_dict[word] = word_list.count(word)
# 簡潔寫法
words_dict = {word: word_list.count(word) for word in word_set}
for k,v in words_dict.items():
print(k,v)
運行結(jié)果:
on 1
also 1
souls 1
focus 1
soul 1
time 1
noisy 1
grew 1
lot 1
childish 1
like 1
voices 1
indulge 1
swarms 1
buckish 1
restless 1
we 4
hear 1
childhood 1
as 1
world 1
themselves 1
are 1
bottom 1
memories 1
the 7
of 6
flies 1
without 1
have 2
day 1
busy 1
to 1
eroded 1
regardless 1
unable 1
innocence 1
up 1
a 1
in 1
mind 1
goes 1
by 1
lost 1
principle 1
once 1
away 2
years 1
beneficial 1
all 1
shackles 1
第三版:
# -*- coding:utf-8 -*-
#!python3
import re
path = 'test.txt'
with open(path, 'r', encoding='utf-8') as f:
word_list = []
word_reg = re.compile(r'\w+')
for line in f:
#line_words = word_reg.findall(line)
#比上面的正則更加簡單
line_words = line.split()
word_list.extend(line_words)
word_set = set(word_list) # 避免重復(fù)查詢
words_dict = {word: word_list.count(word) for word in word_set}
for k, v in words_dict.items():
print(k, v)
運行結(jié)果:
childhood 1
innocence, 1
are 1
of 6
also 1
lost 1
We 1
regardless 1
noisy, 1
by, 1
on 1
themselves. 1
grew 1
lot 1
bottom 1
buckish, 1
time 1
childish 1
voices 1
once 1
restless, 1
shackles 1
world 1
eroded 1
As 1
all 1
day, 1
swarms 1
we 3
soul. 1
memories, 1
in 1
without 1
like 1
beneficial 1
up, 1
unable 1
away 1
flies 1
goes 1
a 1
have 2
away, 1
mind, 1
focus 1
principle, 1
hear 1
to 1
the 7
years 1
busy 1
souls, 1
indulge 1
第四版:使用Counter統(tǒng)計
# -*- coding:utf-8 -*-
#!python3
import collections
import re
path = 'test.txt'
with open(path, 'r', encoding='utf-8') as f:
word_list = []
word_reg = re.compile(r'\w+')
for line in f:
line_words = line.split()
word_list.extend(line_words)
words_dict = dict(collections.Counter(word_list)) #使用Counter統(tǒng)計
for k, v in words_dict.items():
print(k, v)
運行結(jié)果:
We 1
are 1
busy 1
all 1
day, 1
like 1
swarms 1
of 6
flies 1
without 1
souls, 1
noisy, 1
restless, 1
unable 1
to 1
hear 1
the 7
voices 1
soul. 1
As 1
time 1
goes 1
by, 1
childhood 1
away, 1
we 3
grew 1
up, 1
years 1
away 1
a 1
lot 1
memories, 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence, 1
regardless 1
shackles 1
mind, 1
indulge 1
in 1
world 1
buckish, 1
focus 1
on 1
beneficial 1
principle, 1
lost 1
themselves. 1
注:這里使用的測試文本test.txt如下:
We are busy all day, like swarms of flies without souls, noisy, restless, unable to hear the voices of the soul. As time goes by, childhood away, we grew up, years away a lot of memories, once have also eroded the bottom of the childish innocence, we regardless of the shackles of mind, indulge in the world buckish, focus on the beneficial principle, we have lost themselves.
PS:這里再為大家推薦2款相關(guān)統(tǒng)計工具供大家參考:
在線字數(shù)統(tǒng)計工具:
http://tools.jb51.net/code/zishutongji
在線字符統(tǒng)計與編輯工具:
http://tools.jb51.net/code/char_tongji
更多關(guān)于Python相關(guān)內(nèi)容感興趣的讀者可查看本站專題:《Python文件與目錄操作技巧匯總》、《Python文本文件操作技巧匯總》、《Python數(shù)據(jù)結(jié)構(gòu)與算法教程》、《Python函數(shù)使用技巧總結(jié)》、《Python字符串操作技巧匯總》及《Python入門與進階經(jīng)典教程》
希望本文所述對大家Python程序設(shè)計有所幫助。
相關(guān)文章
Python即時網(wǎng)絡(luò)爬蟲項目啟動說明詳解
這篇文章主要為大家詳細介紹了Python即時網(wǎng)絡(luò)爬蟲項目啟動說明,具有一定的參考價值,感興趣的小伙伴們可以參考一下2018-02-02
Python基于內(nèi)置庫pytesseract實現(xiàn)圖片驗證碼識別功能
這篇文章主要介紹了Python基于內(nèi)置庫pytesseract實現(xiàn)圖片驗證碼識別功能,文中通過示例代碼介紹的非常詳細,對大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價值,需要的朋友可以參考下2020-02-02
詳解如何在Apache中運行Python WSGI應(yīng)用
在生產(chǎn)環(huán)境上,一般會使用比較健壯的Web服務(wù)器,如Apache來運行我們的應(yīng)用,本文中我們就會介紹如何使用Apache模塊mod_wsgi來運行Python WSGI應(yīng)用。感興趣的小伙伴們可以參考一下2019-01-01
python使用response.read()接收json數(shù)據(jù)的實例
今天小編就為大家分享一篇python使用response.read()接收json數(shù)據(jù)的實例,具有很好的參考價值,希望對大家有所幫助。一起跟隨小編過來看看吧2018-12-12
anaconda3安裝及jupyter環(huán)境配置全教程
這篇文章主要介紹了anaconda3安裝及jupyter環(huán)境配置全教程,本文給大家介紹的非常詳細,對大家的學(xué)習(xí)或工作具有一定的參考借鑒價值,需要的朋友可以參考下2020-08-08
Django開發(fā)web后端對比SpringBoot示例分析
這篇文章主要介紹了Django開發(fā)web后端對比SpringBoot示例分析,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進步,早日升職加薪2023-12-12

