Python計算文件大小的進度可視化工具

更新時間：2025年12月26日 09:18:53 作者：ChenAI_TGF

這篇文章主要為大家詳細介紹了如何使用Python計算文件大小的進度可視化工具,文中的示例代碼講解詳細,感興趣的小伙伴可以跟隨小編一起學習一下

前言

如果你常處理數(shù)據(jù)備份、數(shù)據(jù)庫整理或大型項目文件，一定遇過這樣的窘境：想知道一個包含數(shù)萬個子文件夾的目錄總大小，雙擊屬性后等待十幾分鐘甚至更久，屏幕卻只有 “正在計算” 的轉(zhuǎn)圈圖標 —— 既不知道還要等多久，也不確定程序是不是卡住了。

近期處理數(shù)據(jù)庫時遇到了這個問題：需要統(tǒng)計目錄的大小，而這個目錄下足足有大量子文件夾且子文件夾內(nèi)部也含有大量數(shù)據(jù)。最初用系統(tǒng)自帶的 “屬性” 功能，等了 10 分鐘還沒出結(jié)果；嘗試簡單的 Python 腳本，要么前期收集文件路徑時卡頓，要么進度條不顯示實時大小，完全摸不透進度。最終通過優(yōu)化進度條邏輯，多線程工作，最終實現(xiàn)了 “秒級反饋、倍速計算” 的效果。具體流程如下：

一、初始方案

1、通過os.walk遍歷文件夾，先遍歷收集所有內(nèi)部文件路徑；

2、用tqdm包裝文件遍歷過程，顯示 “處理文件數(shù)” 的進度條；

3、累加每個文件的大小，最后轉(zhuǎn)換為易讀格式（GB/TB）輸出；

import os
from tqdm import tqdm

def get_folder_size(folder_path):
    """
    計算文件夾指定文件夾的總大小，并顯示進度條
    
    參數(shù):
        folder_path: 文件夾路徑
        
    返回:
        文件夾總大小(字節(jié))
    """
    # 檢查路徑是否存在
    if not os.path.exists(folder_path):
        print(f"錯誤: 路徑 '{folder_path}' 不存在")
        return 0
    
    # 檢查是否是文件夾
    if not os.path.isdir(folder_path):
        print(f"錯誤: '{folder_path}' 不是一個文件夾")
        return 0
    
    # 首先收集所有文件路徑，用于進度條
    file_paths = []
    for root, dirs, files in os.walk(folder_path):
        for file in files:
            file_path = os.path.join(root, file)
            file_paths.append(file_path)
    
    total_size = 0
    # 使用tqdm創(chuàng)建進度條
    for file_path in tqdm(file_paths, desc="計算中", unit="個文件"):
        try:
            # 獲取文件大小并累加
            file_size = os.path.getsize(file_path)
            total_size += file_size
        except OSError as e:
            print(f"\n無法訪問文件 '{file_path}': {e}")
    
    return total_size

def format_size(size_bytes):
    """
    將字節(jié)大小轉(zhuǎn)換為易讀的格式
    
    參數(shù):
        size_bytes: 字節(jié)數(shù)
        
    返回:
        格式化后的大小字符串
    """
    # 定義單位和轉(zhuǎn)換因子
    units = ['B', 'KB', 'MB', 'GB', 'TB']
    size = size_bytes
    unit_index = 0
    
    # 轉(zhuǎn)換單位直到合適的規(guī)模
    while size >= 1024 and unit_index < len(units) - 1:
        size /= 1024
        unit_index += 1
    
    return f"{size:.2f} {units[unit_index]}"

if __name__ == "__main__":
    # 在此處直接設(shè)置要統(tǒng)計的文件夾路徑
    # 例如：folder_path = "C:/Users/YourName/Documents"
    # 或在Linux/macOS上：folder_path = "/home/yourname/documents"
    folder_path = "請?zhí)鎿Q為你的文件夾路徑"
    
    print(f"將計算文件夾: {folder_path} 的大小")
    
    # 計算文件夾大小
    total_size = get_folder_size(folder_path)
    
    # 顯示結(jié)果
    if total_size > 0:
        formatted_size = format_size(total_size)
        print(f"\n文件夾 '{folder_path}' 的總大小: {formatted_size}")

這個過程看似沒有什么問題，但是在第一步的時候，由于文件夾內(nèi)部的文件數(shù)量實在太多（最后計算文件數(shù)共），第一過程便已經(jīng)非常耗時，而這個過程是不在tqdm的包裝中，需要卡頓很久的時間，運行效果基本就是卡在那邊沒有動靜

二、方案優(yōu)化

1、優(yōu)化遍歷邏輯，實現(xiàn)遍歷進度可視

在明白問題的原因（內(nèi)部文件數(shù)量太多，遍歷過程耗時）后，通過一個簡單的改動調(diào)整：

1、在第一次遍歷的時候，只遍歷第一級文件夾，記錄第一級文件夾的路徑，而非像原本邏輯那樣遍歷所有的內(nèi)部文件夾乃至內(nèi)部文件夾的內(nèi)部文件。由于第一級文件夾的數(shù)量相對而言較少（只有四萬多），所以遍歷第一級文件夾的過程是很快的基本可以忽略不計

2、然后利用tqdm包裝進度條的時候，for循環(huán)遍歷的不是全部的內(nèi)部文件路徑，而是第一級文件夾的路徑，在每一個第一級文件夾中在進行內(nèi)部遍歷（此處可以考慮用一個新的第二級進度條）來顯示某一個第一級文件夾的計算進度，會使得結(jié)果更加直觀（遇到較大第一級文件夾的時候更新不卡頓）

實現(xiàn)代碼如下：

import os
import sys
from tqdm import tqdm

def format_size(size_bytes):
    """將字節(jié)大小轉(zhuǎn)換為易讀的格式"""
    units = ['B', 'KB', 'MB', 'GB', 'TB']
    size = size_bytes
    unit_index = 0
    
    while size >= 1024 and unit_index < len(units) - 1:
        size /= 1024
        unit_index += 1
    
    return f"{size:.2f} {units[unit_index]}"

def calculate_directory_size(directory, total_size_ref):
    """計算單個目錄的大小，實時更新總大小引用"""
    dir_size = 0
    for root, dirs, files in os.walk(directory):
        for file in files:
            file_path = os.path.join(root, file)
            try:
                file_size = os.path.getsize(file_path)
                dir_size += file_size
                total_size_ref[0] += file_size  # 使用列表引用更新總大小
            except OSError:
                continue
    return dir_size

def get_folder_size(folder_path):
    """計算文件夾總大小，確保實時顯示累加進度和大小"""
    if not os.path.exists(folder_path):
        print(f"錯誤: 路徑 '{folder_path}' 不存在")
        return 0
    
    if not os.path.isdir(folder_path):
        print(f"錯誤: '{folder_path}' 不是一個文件夾")
        return 0
    
    # 獲取第一級子文件夾和根目錄文件
    items = os.listdir(folder_path)
    first_level_dirs = []
    total_size = [0]  # 使用列表存儲，以便在函數(shù)內(nèi)部修改
    
    # 計算根目錄文件大小
    for item in items:
        item_path = os.path.join(folder_path, item)
        if os.path.isfile(item_path):
            try:
                total_size[0] += os.path.getsize(item_path)
            except OSError:
                continue
        elif os.path.isdir(item_path):
            first_level_dirs.append(item_path)
    
    # 沒有子文件夾的情況
    if not first_level_dirs:
        return total_size[0]
    
    # 顯示初始信息
    print(f"發(fā)現(xiàn) {len(first_level_dirs)} 個子文件夾，正在計算大小...")
    print(f"初始文件大小: {format_size(total_size[0])}")
    
    # 處理所有第一級子文件夾
    progress_bar = tqdm(first_level_dirs, desc="處理中", unit="個文件夾")
    for i, dir_path in enumerate(progress_bar):
        # 計算子文件夾大小，同時更新總大小
        calculate_directory_size(dir_path, total_size)
        
        # 更新進度條，包含當前總大小
        current_size = format_size(total_size[0])
        progress_bar.set_description(f"處理中 - 總大小: {current_size}")
        
        # 每處理100個子文件夾額外打印一次，確保用戶能看到
        if i % 100 == 0 and i > 0:
            print(f"已處理 {i+1}/{len(first_level_dirs)} 個子文件夾，當前總大小: {current_size}")
    
    return total_size[0]

if __name__ == "__main__":
    # 設(shè)置要統(tǒng)計的文件夾路徑
    folder_path = "F:/LIP-REAINDG-DATABASES-ACLP/mouth"  # 直接使用您提供的路徑
    
    print(f"正在計算文件夾: {folder_path} 的大小")
    
    # 計算并顯示結(jié)果
    total_size = get_folder_size(folder_path)
    
    if total_size > 0:
        formatted_size = format_size(total_size)
        print(f"\n文件夾 '{folder_path}' 的總大小: {formatted_size}")

實現(xiàn)結(jié)果如下：

可以看到在一開始運行程序的時候，進度條就已經(jīng)展示出來了，并且以較快的速度進行每個內(nèi)部文件夾以及文件的遍歷。

2、使用多線程以及優(yōu)化IO操作實現(xiàn)進一步速度優(yōu)化

針對 “文件大小統(tǒng)計是 IO 密集型任務(wù)” 的特性，用多線程并行處理子文件夾，并替換更高效的 IO 遍歷函數(shù)。

多線程并行：用ThreadPoolExecutor創(chuàng)建線程池，默認線程數(shù)為 “CPU 核心數(shù) ×5”（IO 密集型任務(wù)適合多線程，避免 IO 等待浪費時間）；

高效 IO 遍歷：用os.scandir替代os.listdir+os.path組合 ——os.scandir一次調(diào)用就能獲取文件屬性（如是否為文件 / 文件夾、大小），比傳統(tǒng)方式快 2-3 倍；

減少鎖競爭：每個線程先獨立計算完子文件夾大小，再通過鎖機制更新總大?。ū苊饷總€文件都加鎖，減少性能損耗）。

具體實現(xiàn)代碼如下（線程數(shù)設(shè)為64）

import os
import sys
import threading
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm

def format_size(size_bytes):
    """將字節(jié)大小轉(zhuǎn)換為易讀的格式"""
    units = ['B', 'KB', 'MB', 'GB', 'TB']
    size = size_bytes
    unit_index = 0
    
    while size >= 1024 and unit_index < len(units) - 1:
        size /= 1024
        unit_index += 1
    
    return f"{size:.2f} {units[unit_index]}"

def calculate_directory_size(directory):
    """計算單個目錄的大?。冇嬎?，不更新總進度）"""
    total_size = 0
    file_count = 0
    with os.scandir(directory) as entries:
        for entry in entries:
            try:
                if entry.is_file(follow_symlinks=False):
                    total_size += entry.stat().st_size
                    file_count += 1
                elif entry.is_dir(follow_symlinks=False):
                    sub_size, sub_count = calculate_directory_size(entry.path)
                    total_size += sub_size
                    file_count += sub_count
            except OSError:
                continue
    return total_size, file_count

def get_folder_size(folder_path, max_workers=None):
    """
    計算文件夾總大?。ǘ嗑€程優(yōu)化版）
    
    參數(shù):
        folder_path: 文件夾路徑
        max_workers: 最大線程數(shù)，默認使用CPU核心數(shù)*5
    """
    if not os.path.exists(folder_path):
        print(f"錯誤: 路徑 '{folder_path}' 不存在")
        return 0
    
    if not os.path.isdir(folder_path):
        print(f"錯誤: '{folder_path}' 不是一個文件夾")
        return 0
    
    # 獲取第一級子文件夾和根目錄文件
    first_level_dirs = []
    root_files_size = 0
    root_file_count = 0

    with os.scandir(folder_path) as entries:
        for entry in entries:
            try:
                if entry.is_file(follow_symlinks=False):
                    root_files_size += entry.stat().st_size
                    root_file_count += 1
                elif entry.is_dir(follow_symlinks=False):
                    first_level_dirs.append(entry.path)
            except OSError:
                continue

    total_size = root_files_size
    total_file_count = root_file_count
    dir_count = len(first_level_dirs)

    if dir_count == 0:
        return total_size, total_file_count

    print(f"發(fā)現(xiàn) {dir_count} 個子文件夾，使用多線程計算大小...")
    print(f"初始文件大小: {format_size(total_size)}，初始文件數(shù): {total_file_count}")

    lock = threading.Lock()
    progress = tqdm(total=dir_count, desc="處理中", unit="個文件夾")

    def process_dir(dir_path):
        dir_size, file_count = calculate_directory_size(dir_path)
        with lock:
            nonlocal total_size, total_file_count
            total_size += dir_size
            total_file_count += file_count
        current_size = format_size(total_size)
        progress.set_description(f"處理中 - 總大小: {current_size}")
        progress.update(1)
        return dir_size, file_count

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [executor.submit(process_dir, dir_path) for dir_path in first_level_dirs]
        for _ in as_completed(futures):
            pass

    progress.close()
    return total_size, total_file_count

if __name__ == "__main__":
    # 設(shè)置要統(tǒng)計的文件夾路徑
    folder_path = r""
    
    print(f"正在計算文件夾: {folder_path} 的大小")
    
    # 計算并顯示結(jié)果（線程數(shù)默認使用CPU核心數(shù)*5，可根據(jù)需要調(diào)整）
    total_size, total_file_count = get_folder_size(folder_path, max_workers=64)

    if total_size > 0:
        formatted_size = format_size(total_size)
        print(f"\n文件夾 '{folder_path}' 的總大小: {formatted_size}")
        print(f"文件總數(shù): {total_file_count}")

實現(xiàn)效果如下：