Python實現(xiàn)文件查詢關鍵字功能的示例詳解

更新時間：2026年02月12日 09:52:17 作者：豆本-豆豆奶

這篇文章主要為大家詳細介紹了Python實現(xiàn)文件查詢關鍵字功能,文中的示例詳解詳細,具有一定的借鑒價值,感興趣的小伙伴可以跟隨小編一起學習一下

思路

主要思路就是通過打開文件夾，獲取文件，一個個遍歷查找關鍵字，流程圖如下：

流程圖

怎么樣，思路非常簡單，所以其實實現(xiàn)也不難。

本文將支持少部分文件類型，更多類型需要讀者自己實現(xiàn)：

txt
docx
csv
xlsx
pptx

讀取txt

安裝庫

pip install chardet

代碼

import chardet


def detect_encoding(file_path):
    raw_data = None
    with open(file_path, 'rb') as f:
        for line in f:
            raw_data = line
            break

        if raw_data is None:
            raw_data = f.read()
    result = chardet.detect(raw_data)
    return result['encoding']


def read_txt(file_path, keywords=''):
    is_in = False
    encoding = detect_encoding(file_path)
    with open(file_path, 'r', encoding=encoding) as f:
        for line in f:
            if line.find(keywords) != -1:
                is_in = True
                break

    return is_in

我們使用了 chardet 庫來判斷 txt 的編碼，以應對不同編碼的讀取方式。

讀取docx

安裝庫

pip install python-docx

代碼

from docx import Document


def read_docx(file_path, keywords=''):
    doc = Document(file_path)
    is_in = False

    for para in doc.paragraphs:
        if para.text.find(keywords) != -1:
            is_in = True
            break

    return is_in

讀取csv

代碼

import csv


def read_csv(file_path, keywords=''):
    is_in = False

    encoding = detect_encoding(file_path)
    with open(file_path, mode='r', encoding=encoding) as f:
        reader = csv.reader(f)

        for row in reader:
            row_text = ''.join([str(v) for v in row])
            if row_text.find(keywords) != -1:
                is_in = True
                break

    return is_in

讀取xlsx

安裝庫

pip install openpyxl

代碼

from openpyxl import load_workbook


def read_xlsx(file_path, keywords=''):
    wb = load_workbook(file_path)
    sheet_names = wb.sheetnames

    is_in = False
    for sheet_name in sheet_names:
        sheet = wb[sheet_name]
        for row in sheet.iter_rows(values_only=True):
            row_text = ''.join([str(v) for v in row])
            if row_text.find(keywords) != -1:
                is_in = True
                break

    wb.close()

    return is_in

讀取pptx

安裝庫

pip install python-pptx

代碼

from pptx import Presentation


def read_ppt(ppt_file, keywords=''):
    prs = Presentation(ppt_file)
    is_in = False
    for slide in prs.slides:
        for shape in slide.shapes:
            if shape.has_text_frame:
                text_frame = shape.text_frame
                for paragraph in text_frame.paragraphs:
                    for run in paragraph.runs:
                        if run.text.find(keywords) != -1:
                            is_in = True
                            break

    return is_in

文件夾遞歸

為了防止文件夾嵌套導致的問題，我們還有一個文件夾遞歸的操作。

代碼

from pathlib import Path


def list_files_recursive(directory):
    file_paths = []

    for path in Path(directory).rglob('*'):
        if path.is_file():
            file_paths.append(str(path))

    return file_paths

完整代碼

# -*- coding: utf-8 -*-
from pptx import Presentation
import chardet
from docx import Document
import csv
from openpyxl import load_workbook
from pathlib import Path


def detect_encoding(file_path):
    raw_data = None
    with open(file_path, 'rb') as f:
        for line in f:
            raw_data = line
            break

        if raw_data is None:
            raw_data = f.read()
    result = chardet.detect(raw_data)
    return result['encoding']


def read_txt(file_path, keywords=''):
    is_in = False
    encoding = detect_encoding(file_path)
    with open(file_path, 'r', encoding=encoding) as f:
        for line in f:
            if line.find(keywords) != -1:
                is_in = True
                break

    return is_in


def read_docx(file_path, keywords=''):
    doc = Document(file_path)
    is_in = False

    for para in doc.paragraphs:
        if para.text.find(keywords) != -1:
            is_in = True
            break

    return is_in


def read_csv(file_path, keywords=''):
    is_in = False

    encoding = detect_encoding(file_path)
    with open(file_path, mode='r', encoding=encoding) as f:
        reader = csv.reader(f)

        for row in reader:
            row_text = ''.join([str(v) for v in row])
            if row_text.find(keywords) != -1:
                is_in = True
                break

    return is_in


def read_xlsx(file_path, keywords=''):
    wb = load_workbook(file_path)
    sheet_names = wb.sheetnames

    is_in = False
    for sheet_name in sheet_names:
        sheet = wb[sheet_name]
        for row in sheet.iter_rows(values_only=True):
            row_text = ''.join([str(v) for v in row])
            if row_text.find(keywords) != -1:
                is_in = True
                break

    wb.close()

    return is_in


def read_ppt(ppt_file, keywords=''):
    prs = Presentation(ppt_file)
    is_in = False
    for slide in prs.slides:
        for shape in slide.shapes:
            if shape.has_text_frame:
                text_frame = shape.text_frame
                for paragraph in text_frame.paragraphs:
                    for run in paragraph.runs:
                        if run.text.find(keywords) != -1:
                            is_in = True
                            break

    return is_in


def list_files_recursive(directory):
    file_paths = []

    for path in Path(directory).rglob('*'):
        if path.is_file():
            file_paths.append(str(path))

    return file_paths


if __name__ == '__main__':
    keywords = '測試關鍵字'
    file_paths = list_files_recursive(r'測試文件夾')
    for file_path in file_paths:
        if file_path.endswith('.txt'):
            is_in = read_txt(file_path, keywords)
        elif file_path.endswith('.docx'):
            is_in = read_docx(file_path, keywords)
        elif file_path.endswith('.csv'):
            is_in = read_csv(file_path, keywords)
        elif file_path.endswith('.xlsx'):
            is_in = read_xlsx(file_path, keywords)
        elif file_path.endswith('.pptx'):
            is_in = read_ppt(file_path, keywords)

        if is_in:
            print(file_path)

結尾

現(xiàn)在你可以十分方便地使用代碼查找出各種文件中是否存在關鍵字了

以上就是Python實現(xiàn)文件查詢關鍵字功能的示例詳解的詳細內(nèi)容，更多關于Python查詢文件關鍵字的資料請關注腳本之家其它相關文章！

您可能感興趣的文章:

解決json.decoder.JSONDecodeError: Expecting value:&n
這篇文章主要介紹了解決json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)錯誤,具有很好的參考價值,希望對大家有所幫助,如有錯誤或未考慮完全的地方,望不吝賜教
2024-04-04
win10環(huán)境下配置vscode python開發(fā)環(huán)境的教程詳解
這篇文章主要介紹了win10環(huán)境下配置python開發(fā)環(huán)境(vscode)的教程，非常不錯，具有一定的參考借鑒價值,需要的朋友可以參考下
2019-10-10
python調(diào)用sikulixide庫實現(xiàn)自動化腳本方法實例
SikuliX IDE是一個基于圖像識別的自動化測試工具,主要用于UI測試,它本身并不直接支持文本文件讀取操作,因為它主要用于處理屏幕上的圖片和截圖,這篇文章主要介紹了python調(diào)用sikulixide庫實現(xiàn)自動化腳本的相關資料,需要的朋友可以參考下
2025-11-11
查看django執(zhí)行的sql語句及消耗時間的兩種方法
今天小編就為大家分享一篇查看django執(zhí)行的sql語句及消耗時間的兩種方法，具有很好的參考價值，希望對大家有所幫助。一起跟隨小編過來看看吧
2018-05-05
解決numpy和torch數(shù)據(jù)類型轉化的問題
這篇文章主要介紹了解決numpy和torch數(shù)據(jù)類型轉化的問題，具有很好的參考價值，希望對大家有所幫助。如有錯誤或未考慮完全的地方，望不吝賜教
2021-05-05
Python3基礎之基本數(shù)據(jù)類型概述
這篇文章主要介紹了Python3的基本數(shù)據(jù)類型,需要的朋友可以參考下
2014-08-08
如何通過Django使用本地css/js文件
這篇文章主要介紹了如何通過Django使用本地css/js文件,文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值,需要的朋友可以參考下
2020-01-01
Python命令行中引導用戶指定選擇文檔示例
這篇文章主要為大家介紹了Python命令行中引導用戶指定選擇文檔示例詳解,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進步,早日升職加薪
2023-11-11
Python求兩點之間的直線距離(2種實現(xiàn)方法)
今天小編就為大家分享一篇Python求兩點之間的直線距離(2種實現(xiàn)方法)，具有很好的參考價值，希望對大家有所幫助。一起跟隨小編過來看看吧
2019-07-07
Python如何使用opencv進行手勢識別詳解
目前,人們正需要研發(fā)以人為中心進行計算機交互控制,所以下面這篇文章主要給大家介紹了關于Python如何使用opencv進行手勢識別的相關資料,文中通過實例代碼介紹的非常詳細,需要的朋友可以參考下
2022-01-01