Python結(jié)合moviepy和tkinter自制音視頻轉(zhuǎn)字幕工具

更新時(shí)間：2025年12月26日 09:24:47 作者：ChenAI_TGF

在多媒體內(nèi)容爆炸的時(shí)代,為音視頻添加字幕成為提升內(nèi)容可訪問性、傳播效率的重要手段,本文將使用Python自制一個(gè)音視頻轉(zhuǎn)字幕工具,感興趣的小伙伴可以了解下

前言

在多媒體內(nèi)容爆炸的時(shí)代，為音視頻添加字幕成為提升內(nèi)容可訪問性、傳播效率的重要手段。無論是自媒體創(chuàng)作者、教育工作者還是普通用戶，都可能面臨手動(dòng)制作字幕耗時(shí)費(fèi)力的問題?；诖?，我開發(fā)了一款"音視頻轉(zhuǎn)字幕工具"，借助OpenAI的Whisper語(yǔ)音識(shí)別模型，實(shí)現(xiàn)從音視頻文件到標(biāo)準(zhǔn)SRT字幕的自動(dòng)化轉(zhuǎn)換。

這款工具結(jié)合了moviepy的音視頻處理能力、whisper的語(yǔ)音識(shí)別能力和tkinter的可視化界面，讓用戶無需專業(yè)知識(shí)即可快速生成字幕。下面將詳細(xì)介紹工具的實(shí)現(xiàn)原理與使用方法。

一、工具介紹

這款音視頻轉(zhuǎn)字幕工具具備以下核心功能：

多格式支持：兼容MP4、AVI、MOV等視頻格式及MP3、WAV等音頻格式
靈活參數(shù)配置：可選擇Whisper模型（tiny/base/small/medium/large）、調(diào)整線程數(shù)和溫度值
實(shí)時(shí)進(jìn)度反饋：雙進(jìn)度條分別顯示音頻提取和字幕識(shí)別進(jìn)度
時(shí)間估算：實(shí)時(shí)顯示處理耗時(shí)和預(yù)計(jì)剩余時(shí)間
繁簡(jiǎn)轉(zhuǎn)換：自動(dòng)將識(shí)別結(jié)果轉(zhuǎn)換為簡(jiǎn)體中文
標(biāo)準(zhǔn)字幕輸出：生成符合SRT格式的字幕文件，可直接用于視頻編輯

工具的優(yōu)勢(shì)在于可視化操作降低了技術(shù)門檻，同時(shí)保留了參數(shù)調(diào)整的靈活性，兼顧了普通用戶和進(jìn)階用戶的需求。

二、代碼核心部分詳解

1. 音頻提取模塊

音視頻文件首先需要提取音頻軌道，這一步由video_to_audio方法實(shí)現(xiàn)：

def video_to_audio(self, video_path: str, audio_path: str) -> bool:
    try:
        with VideoFileClip(video_path) as video:
            total_duration = video.duration
            audio = video.audio
            
            # 記錄音頻轉(zhuǎn)換開始時(shí)間
            self.audio_start_time = datetime.now()
            
            # 寫入音頻（logger=None關(guān)閉冗余輸出）
            audio.write_audiofile(audio_path, logger=None)
            # 強(qiáng)制進(jìn)度到100%
            self.update_audio_progress(100.0)
        return True
    except Exception as e:
        messagebox.showerror("錯(cuò)誤", f"音視頻轉(zhuǎn)音頻失敗：{str(e)}")
        return False

核心邏輯：使用moviepy的VideoFileClip讀取視頻文件，提取音頻軌道后寫入WAV格式文件。對(duì)于本身就是音頻的文件（如MP3），會(huì)直接跳過提取步驟

2. Whisper模型加載與語(yǔ)音識(shí)別

語(yǔ)音轉(zhuǎn)文字是工具的核心功能，基于OpenAI的Whisper模型實(shí)現(xiàn)：

def load_whisper_model(self) -> Optional[whisper.Whisper]:
    try:
        model_name = self.model_var.get()
        self.update_transcribe_progress(10)
        # 加載指定模型，使用CPU運(yùn)行（可改為"cuda"啟用GPU加速）
        model = whisper.load_model(model_name, device="cpu") 
        self.update_transcribe_progress(20)
        return model
    except Exception as e:
        messagebox.showerror("錯(cuò)誤", f"模型加載失?。簕str(e)}")
        return None

def transcribe_audio(self, audio_path: str) -> Optional[dict]:
    global is_running
    self.model = self.load_whisper_model()
    if not self.model or not is_running:
        return None

    try:
        # 記錄字幕識(shí)別開始時(shí)間
        self.transcribe_start_time = datetime.now()
        
        # 分段識(shí)別模擬進(jìn)度
        self.update_transcribe_progress(30)
        result = self.model.transcribe(
            audio_path,
            language="zh",  # 指定中文識(shí)別
            temperature=self.temp_var.get(),  # 控制輸出隨機(jī)性
        )
        self.update_transcribe_progress(80)
        
        if not is_running:
            return None
        
        self.update_transcribe_progress(100)
        return result
    except Exception as e:
        messagebox.showerror("錯(cuò)誤", f"字幕識(shí)別失敗：{str(e)}")
        return None

核心邏輯：

先加載用戶選擇的Whisper模型（模型越小速度越快，精度越低）
通過transcribe方法處理音頻，指定language="zh"優(yōu)化中文識(shí)別效果
temperature參數(shù)控制輸出隨機(jī)性（0表示確定性輸出，適合字幕生成）

3. SRT字幕格式化

識(shí)別結(jié)果需要轉(zhuǎn)換為標(biāo)準(zhǔn)SRT格式，包含序號(hào)、時(shí)間軸和文本：

def format_srt(self, result: dict) -> str:
    srt_content = ""
    for i, segment in enumerate(result["segments"], 1):
        start = self.format_time(segment["start"])
        end = self.format_time(segment["end"])
        # 繁簡(jiǎn)轉(zhuǎn)換（將可能的繁體轉(zhuǎn)為簡(jiǎn)體）
        text = self.cc.convert(segment["text"].strip())
        srt_content += f"{i}\n{start} --> {end}\n{text}\n\n"
    return srt_content

@staticmethod
def format_time(seconds: float) -> str:
    """將秒數(shù)格式化為SRT時(shí)間格式（hh:mm:ss,fff）"""
    hours = math.floor(seconds / 3600)
    minutes = math.floor((seconds % 3600) / 60)
    secs = math.floor(seconds % 60)
    millis = math.floor((seconds % 1) * 1000)
    return f"{hours:02d}:{minutes:02d}:{secs:02d},{millis:03d}"

核心邏輯：

解析Whisper返回的分段結(jié)果（包含開始時(shí)間、結(jié)束時(shí)間和文本）
將時(shí)間戳轉(zhuǎn)換為SRT要求的hh:mm:ss,fff格式
使用OpenCC進(jìn)行繁簡(jiǎn)轉(zhuǎn)換，確保輸出統(tǒng)一為簡(jiǎn)體中文

4. 多線程與進(jìn)度管理

為避免UI卡頓，核心處理邏輯在獨(dú)立線程中運(yùn)行：

def start_convert(self):
    """啟動(dòng)轉(zhuǎn)換線程"""
    thread = threading.Thread(target=self.convert_thread, daemon=True)
    thread.start()

def convert_thread(self):
    """轉(zhuǎn)換線程（避免UI卡頓）"""
    global is_running
    is_running = True
    self.start_btn.config(state=tk.DISABLED)
    self.stop_btn.config(state=tk.NORMAL)
    
    # 重置進(jìn)度和時(shí)間
    self.update_audio_progress(0.0)
    self.update_transcribe_progress(0.0)
    self.result_text.delete(1.0, tk.END)
    self.total_start_time = datetime.now()
    
    # 核心處理流程
    input_path = self.file_path_var.get()
    # 音頻提取 -> 語(yǔ)音識(shí)別 -> 字幕格式化 -> 保存文件
    # ...（省略具體步驟）

核心邏輯：

將耗時(shí)的音頻處理和識(shí)別任務(wù)放入子線程
通過全局變量is_running實(shí)現(xiàn)主線程與子線程的通信
實(shí)時(shí)更新進(jìn)度條和時(shí)間顯示，提升用戶體驗(yàn)

三、完整代碼

import tkinter as tk
from tkinter import ttk, filedialog, messagebox
import os
from moviepy import VideoFileClip
import whisper
import threading
import time
from datetime import datetime, timedelta
from typing import Optional
import math
from opencc import OpenCC

# 全局變量用于控制進(jìn)度和線程
audio_convert_progress = 0.0
transcribe_progress = 0.0
is_running = False

class AudioVideoToSubtitle:
    def __init__(self, root):
        self.root = root
        self.root.title("音視頻轉(zhuǎn)字幕工具")
        self.root.geometry("900x650")  # 擴(kuò)大窗口以容納時(shí)間顯示
        
        # 初始化繁簡(jiǎn)轉(zhuǎn)換器
        self.cc = OpenCC('t2s')
        
        # 時(shí)間跟蹤變量
        self.total_start_time = None  # 總處理開始時(shí)間
        self.audio_start_time = None  # 音頻轉(zhuǎn)換開始時(shí)間
        self.transcribe_start_time = None  # 字幕識(shí)別開始時(shí)間
        self.time_update_id = None  # 時(shí)間更新定時(shí)器ID
        
        # 初始化模型
        self.model = None
        self.model_path = None
        
        self.init_ui()

    def init_ui(self):
        # 1. 文件選擇區(qū)域
        file_frame = ttk.LabelFrame(self.root, text="文件設(shè)置")
        file_frame.pack(fill=tk.X, padx=10, pady=5)
        
        self.file_path_var = tk.StringVar()
        ttk.Entry(file_frame, textvariable=self.file_path_var, width=70).pack(side=tk.LEFT, padx=5, pady=5)
        ttk.Button(file_frame, text="選擇文件", command=self.select_file).pack(side=tk.LEFT, padx=5, pady=5)

        # 2. 速度調(diào)節(jié)參數(shù)區(qū)域
        param_frame = ttk.LabelFrame(self.root, text="速度調(diào)節(jié)參數(shù)")
        param_frame.pack(fill=tk.X, padx=10, pady=5)
        
        ttk.Label(param_frame, text="模型選擇:").pack(side=tk.LEFT, padx=5, pady=5)
        self.model_var = tk.StringVar(value="base")
        model_options = ["tiny", "base", "small", "medium", "large"]
        ttk.Combobox(param_frame, textvariable=self.model_var, values=model_options, width=10).pack(side=tk.LEFT, padx=5, pady=5)
        
        ttk.Label(param_frame, text="線程數(shù):").pack(side=tk.LEFT, padx=5, pady=5)
        self.thread_var = tk.IntVar(value=4)
        ttk.Spinbox(param_frame, from_=1, to=16, textvariable=self.thread_var, width=5).pack(side=tk.LEFT, padx=5, pady=5)
        
        ttk.Label(param_frame, text="溫度值:").pack(side=tk.LEFT, padx=5, pady=5)
        self.temp_var = tk.DoubleVar(value=0.0)
        ttk.Spinbox(param_frame, from_=0.0, to=1.0, increment=0.1, textvariable=self.temp_var, width=5).pack(side=tk.LEFT, padx=5, pady=5)

        # 3. 進(jìn)度條和時(shí)間顯示區(qū)域
        progress_frame = ttk.LabelFrame(self.root, text="處理進(jìn)度與時(shí)間")
        progress_frame.pack(fill=tk.X, padx=10, pady=5)
        
        # 音視頻轉(zhuǎn)音頻進(jìn)度條
        ttk.Label(progress_frame, text="音視頻轉(zhuǎn)音頻:").pack(side=tk.LEFT, padx=5)
        self.audio_progress = ttk.Progressbar(progress_frame, orient=tk.HORIZONTAL, length=300, mode='determinate')
        self.audio_progress.pack(side=tk.LEFT, padx=5, pady=5)
        self.audio_progress_label = ttk.Label(progress_frame, text="0%")
        self.audio_progress_label.pack(side=tk.LEFT, padx=5)
        
        # 字幕識(shí)別進(jìn)度條
        ttk.Label(progress_frame, text="字幕識(shí)別:").pack(side=tk.LEFT, padx=5)
        self.transcribe_progress = ttk.Progressbar(progress_frame, orient=tk.HORIZONTAL, length=300, mode='determinate')
        self.transcribe_progress.pack(side=tk.LEFT, padx=5, pady=5)
        self.transcribe_progress_label = ttk.Label(progress_frame, text="0%")
        self.transcribe_progress_label.pack(side=tk.LEFT, padx=5)

        # 時(shí)間顯示區(qū)域
        time_frame = ttk.LabelFrame(self.root, text="時(shí)間信息")
        time_frame.pack(fill=tk.X, padx=10, pady=5)
        
        self.elapsed_time_var = tk.StringVar(value="已處理時(shí)間: 00:00:00")
        ttk.Label(time_frame, textvariable=self.elapsed_time_var).pack(side=tk.LEFT, padx=20, pady=5)
        
        self.estimated_time_var = tk.StringVar(value="預(yù)計(jì)剩余時(shí)間: --:--:--")
        ttk.Label(time_frame, textvariable=self.estimated_time_var).pack(side=tk.LEFT, padx=20, pady=5)

        # 4. 控制按鈕區(qū)域
        btn_frame = ttk.Frame(self.root)
        btn_frame.pack(pady=10)
        
        self.start_btn = ttk.Button(btn_frame, text="開始轉(zhuǎn)換", command=self.start_convert)
        self.start_btn.pack(side=tk.LEFT, padx=10)
        self.stop_btn = ttk.Button(btn_frame, text="停止轉(zhuǎn)換", command=self.stop_convert, state=tk.DISABLED)
        self.stop_btn.pack(side=tk.LEFT, padx=10)

        # 5. 結(jié)果顯示區(qū)域
        result_frame = ttk.LabelFrame(self.root, text="識(shí)別結(jié)果預(yù)覽")
        result_frame.pack(fill=tk.BOTH, expand=True, padx=10, pady=5)
        
        self.result_text = tk.Text(result_frame, height=15)
        scrollbar = ttk.Scrollbar(result_frame, command=self.result_text.yview)
        self.result_text.configure(yscrollcommand=scrollbar.set)
        self.result_text.pack(side=tk.LEFT, fill=tk.BOTH, expand=True, padx=5, pady=5)
        scrollbar.pack(side=tk.RIGHT, fill=tk.Y, padx=5, pady=5)

    def select_file(self):
        """選擇音視頻文件"""
        file_types = [
            ("音視頻文件", "*.mp4 *.avi *.mov *.mkv *.flv *.mp3 *.wav *.m4a"),
            ("所有文件", "*.*")
        ]
        file_path = filedialog.askopenfilename(filetypes=file_types)
        if file_path:
            self.file_path_var.set(file_path)

    def update_audio_progress(self, value: float):
        """更新音頻轉(zhuǎn)換進(jìn)度條"""
        global audio_convert_progress
        audio_convert_progress = min(value, 100.0)
        self.audio_progress["value"] = audio_convert_progress
        self.audio_progress_label.config(text=f"{int(audio_convert_progress)}%")
        self.root.update_idletasks()

    def update_transcribe_progress(self, value: float):
        """更新字幕識(shí)別進(jìn)度條"""
        global transcribe_progress
        transcribe_progress = min(value, 100.0)
        self.transcribe_progress["value"] = transcribe_progress
        self.transcribe_progress_label.config(text=f"{int(transcribe_progress)}%")
        self.root.update_idletasks()

    def format_time_display(self, seconds: float) -> str:
        """將秒數(shù)格式化為時(shí)分秒顯示"""
        hours, remainder = divmod(int(seconds), 3600)
        minutes, seconds = divmod(remainder, 60)
        return f"{hours:02d}:{minutes:02d}:{seconds:02d}"

    def update_time_display(self):
        """更新時(shí)間顯示信息"""
        if not is_running or not self.total_start_time:
            return

        # 計(jì)算已處理時(shí)間
        elapsed_seconds = (datetime.now() - self.total_start_time).total_seconds()
        self.elapsed_time_var.set(f"已處理時(shí)間: {self.format_time_display(elapsed_seconds)}")

        # 計(jì)算預(yù)計(jì)剩余時(shí)間
        try:
            if audio_convert_progress < 100:
                # 音頻轉(zhuǎn)換階段
                if self.audio_start_time and audio_convert_progress > 0:
                    audio_elapsed = (datetime.now() - self.audio_start_time).total_seconds()
                    total_audio_estimated = audio_elapsed / (audio_convert_progress / 100)
                    audio_remaining = total_audio_estimated - audio_elapsed
                    
                    # 假設(shè)轉(zhuǎn)錄時(shí)間與音頻時(shí)間相當(dāng)（簡(jiǎn)單估算）
                    total_estimated = total_audio_estimated * 2
                    remaining = total_estimated - elapsed_seconds
                    self.estimated_time_var.set(f"預(yù)計(jì)剩余時(shí)間: {self.format_time_display(remaining)}")
            else:
                # 字幕識(shí)別階段
                if self.transcribe_start_time and transcribe_progress > 0 and transcribe_progress < 100:
                    transcribe_elapsed = (datetime.now() - self.transcribe_start_time).total_seconds()
                    total_transcribe_estimated = transcribe_elapsed / (transcribe_progress / 100)
                    transcribe_remaining = total_transcribe_estimated - transcribe_elapsed
                    self.estimated_time_var.set(f"預(yù)計(jì)剩余時(shí)間: {self.format_time_display(transcribe_remaining)}")
                elif transcribe_progress >= 100:
                    self.estimated_time_var.set(f"預(yù)計(jì)剩余時(shí)間: 00:00:00")
        except (ZeroDivisionError, Exception):
            self.estimated_time_var.set(f"預(yù)計(jì)剩余時(shí)間: 計(jì)算中...")

        # 繼續(xù)定時(shí)更新
        self.time_update_id = self.root.after(1000, self.update_time_display)

    def video_to_audio(self, video_path: str, audio_path: str) -> bool:
        """音視頻轉(zhuǎn)音頻，帶進(jìn)度更新"""
        try:
            with VideoFileClip(video_path) as video:
                total_duration = video.duration
                audio = video.audio

                # 記錄音頻轉(zhuǎn)換開始時(shí)間
                self.audio_start_time = datetime.now()
                
                # 寫入音頻
                audio.write_audiofile(audio_path, logger=None)
                # 強(qiáng)制進(jìn)度到100%
                self.update_audio_progress(100.0)
            return True
        except Exception as e:
            messagebox.showerror("錯(cuò)誤", f"音視頻轉(zhuǎn)音頻失?。簕str(e)}")
            return False

    def load_whisper_model(self) -> Optional[whisper.Whisper]:
        """加載whisper模型"""
        try:
            model_name = self.model_var.get()
            self.update_transcribe_progress(10)
            model = whisper.load_model(model_name, device="cpu") 
            self.update_transcribe_progress(20)
            return model
        except Exception as e:
            messagebox.showerror("錯(cuò)誤", f"模型加載失敗：{str(e)}")
            return None

    def transcribe_audio(self, audio_path: str) -> Optional[dict]:
        """音頻轉(zhuǎn)字幕，帶進(jìn)度更新"""
        global is_running
        self.model = self.load_whisper_model()
        if not self.model or not is_running:
            return None

        try:
            # 記錄字幕識(shí)別開始時(shí)間
            self.transcribe_start_time = datetime.now()
            
            # 分段識(shí)別模擬進(jìn)度
            self.update_transcribe_progress(30)
            result = self.model.transcribe(
                audio_path,
                language="zh",
                temperature=self.temp_var.get(),
            )
            self.update_transcribe_progress(80)
            
            if not is_running:
                return None
            
            self.update_transcribe_progress(100)
            return result
        except Exception as e:
            messagebox.showerror("錯(cuò)誤", f"字幕識(shí)別失?。簕str(e)}")
            return None

    def format_srt(self, result: dict) -> str:
        """將識(shí)別結(jié)果格式化為SRT字幕格式"""
        srt_content = ""
        for i, segment in enumerate(result["segments"], 1):
            start = self.format_time(segment["start"])
            end = self.format_time(segment["end"])
            text = self.cc.convert(segment["text"].strip())
            srt_content += f"{i}\n{start} --> {end}\n{text}\n\n"
        return srt_content

    @staticmethod
    def format_time(seconds: float) -> str:
        """將秒數(shù)格式化為SRT時(shí)間格式（hh:mm:ss,fff）"""
        hours = math.floor(seconds / 3600)
        minutes = math.floor((seconds % 3600) / 60)
        secs = math.floor(seconds % 60)
        millis = math.floor((seconds % 1) * 1000)
        return f"{hours:02d}:{minutes:02d}:{secs:02d},{millis:03d}"

    def save_subtitle(self, srt_content: str, input_path: str):
        """保存字幕文件"""
        save_path = os.path.splitext(input_path)[0] + ".srt"
        try:
            with open(save_path, "w", encoding="utf-8") as f:
                f.write(srt_content)
            messagebox.showinfo("成功", f"字幕已保存至：\n{save_path}")
            return save_path
        except Exception as e:
            messagebox.showerror("錯(cuò)誤", f"字幕保存失?。簕str(e)}")
            return None

    def convert_thread(self):
        """轉(zhuǎn)換線程（避免UI卡頓）"""
        global is_running
        is_running = True
        self.start_btn.config(state=tk.DISABLED)
        self.stop_btn.config(state=tk.NORMAL)
        
        # 重置進(jìn)度和時(shí)間
        self.update_audio_progress(0.0)
        self.update_transcribe_progress(0.0)
        self.result_text.delete(1.0, tk.END)
        self.total_start_time = datetime.now()
        self.audio_start_time = None
        self.transcribe_start_time = None
        
        # 啟動(dòng)時(shí)間更新
        self.root.after(0, self.update_time_display)

        input_path = self.file_path_var.get()
        if not os.path.exists(input_path):
            messagebox.showwarning("警告", "請(qǐng)選擇有效的音視頻文件！")
            self.reset_ui()
            return

        # 臨時(shí)音頻文件路徑
        temp_audio = "temp_audio.wav"
        try:
            # 步驟1：音視頻轉(zhuǎn)音頻
            if not is_running:
                return
            if input_path.lower().endswith(("mp3", "wav", "m4a")):
                # 已是音頻文件，跳過轉(zhuǎn)換
                self.update_audio_progress(100.0)
                audio_path = input_path
                self.audio_start_time = datetime.now()  # 標(biāo)記音頻處理完成時(shí)間
            else:
                if not self.video_to_audio(input_path, temp_audio):
                    return
                audio_path = temp_audio

            # 步驟2：音頻轉(zhuǎn)字幕
            if not is_running:
                return
            result = self.transcribe_audio(audio_path)
            if not result or not is_running:
                return

            # 步驟3：格式化并顯示結(jié)果
            srt_content = self.format_srt(result)
            self.result_text.insert(1.0, srt_content)

            # 步驟4：保存字幕
            self.save_subtitle(srt_content, input_path)

        finally:
            # 清理臨時(shí)文件
            if os.path.exists(temp_audio) and not input_path.lower().endswith(("mp3", "wav", "m4a")):
                os.remove(temp_audio)
            self.reset_ui()

    def start_convert(self):
        """啟動(dòng)轉(zhuǎn)換線程"""
        thread = threading.Thread(target=self.convert_thread, daemon=True)
        thread.start()

    def stop_convert(self):
        """停止轉(zhuǎn)換"""
        global is_running
        is_running = False
        self.stop_btn.config(state=tk.DISABLED)
        self.result_text.insert(tk.END, "\n\n轉(zhuǎn)換已停止！")

    def reset_ui(self):
        """重置UI狀態(tài)"""
        global is_running
        is_running = False
        self.start_btn.config(state=tk.NORMAL)
        self.stop_btn.config(state=tk.DISABLED)
        
        # 停止時(shí)間更新
        if self.time_update_id:
            self.root.after_cancel(self.time_update_id)
            self.time_update_id = None
        
        # 重置時(shí)間顯示
        self.elapsed_time_var.set("已處理時(shí)間: 00:00:00")
        self.estimated_time_var.set("預(yù)計(jì)剩余時(shí)間: --:--:--")

if __name__ == "__main__":
    # 提示安裝依賴
    try:
        import torch
        import opencc
    except ImportError as e:
        missing = str(e).split("'")[1]
        messagebox.showwarning("提示", f"請(qǐng)先安裝依賴庫(kù)：\npip install torch moviepy openai-whisper ffmpeg-python opencc-python-reimplemented")
        exit()

    root = tk.Tk()
    app = AudioVideoToSubtitle(root)
    root.mainloop()

四、效果演示

工具啟動(dòng)：運(yùn)行程序后顯示主界面，包含文件選擇、參數(shù)配置、進(jìn)度顯示和結(jié)果預(yù)覽區(qū)域。

文件選擇：點(diǎn)擊"選擇文件"按鈕，選擇需要轉(zhuǎn)換的音視頻文件（如MP4格式視頻）。

參數(shù)配置：

模型選擇：根據(jù)需求選擇（tiny最快，large最精準(zhǔn)）
線程數(shù)：根據(jù)CPU核心數(shù)調(diào)整（建議4-8）
溫度值：默認(rèn)0.0（適合字幕生成）

開始轉(zhuǎn)換：

點(diǎn)擊"開始轉(zhuǎn)換"，音頻提取進(jìn)度條開始推進(jìn)
提取完成后，字幕識(shí)別進(jìn)度條啟動(dòng)
實(shí)時(shí)顯示"已處理時(shí)間"和"預(yù)計(jì)剩余時(shí)間"

結(jié)果查看：

識(shí)別完成后，結(jié)果預(yù)覽區(qū)顯示SRT格式字幕
自動(dòng)保存與原文件同名的SRT文件（如"視頻.mp4"生成"視頻.srt"）
彈窗提示保存路徑

中途停止：如需中斷，點(diǎn)擊"停止轉(zhuǎn)換"按鈕，工具會(huì)清理臨時(shí)文件并重置狀態(tài)。

五、第三方庫(kù)安裝

1、需要的庫(kù)

該音視頻轉(zhuǎn)字幕工具的代碼依賴以下第三方庫(kù)，以下是各庫(kù)的作用及安裝方法：

1.torch（PyTorch）

作用：Whisper模型運(yùn)行的基礎(chǔ)框架，用于加載和運(yùn)行語(yǔ)音識(shí)別模型（Whisper基于PyTorch實(shí)現(xiàn)）。

安裝命令：推薦根據(jù)系統(tǒng)和是否需要GPU加速，從PyTorch官網(wǎng)獲取對(duì)應(yīng)命令，基礎(chǔ)CPU版本可直接安裝：

pip install torch

2.moviepy

作用：音視頻處理庫(kù)，用于從視頻中提取音頻軌道（核心功能之一）。

安裝命令：

pip install moviepy

3.openai-whisper

作用：OpenAI官方的語(yǔ)音識(shí)別庫(kù)，提供Whisper模型（實(shí)現(xiàn)語(yǔ)音轉(zhuǎn)文字的核心功能）。

安裝命令：

pip install openai-whisper

4.ffmpeg-python

作用：moviepy處理音視頻時(shí)依賴的底層工具封裝，用于實(shí)際執(zhí)行音視頻編解碼操作。

注意：除了安裝Python庫(kù)，還需要在系統(tǒng)中安裝ffmpeg程序（否則moviepy可能無法正常工作）：

Python庫(kù)安裝：

pip install ffmpeg-python

系統(tǒng)級(jí)ffmpeg安裝：

Windows：從ffmpeg官網(wǎng)下載安裝包，解壓后將bin目錄添加到系統(tǒng)環(huán)境變量。
Ubuntu/Debian：sudo apt-get install ffmpeg
macOS：brew install ffmpeg（需先安裝Homebrew）

5.opencc-python-reimplemented

作用：繁簡(jiǎn)轉(zhuǎn)換庫(kù)，用于將識(shí)別結(jié)果中的繁體中文自動(dòng)轉(zhuǎn)換為簡(jiǎn)體中文（代碼中通過OpenCC('t2s')實(shí)現(xiàn)）。

安裝命令：

pip install opencc-python-reimplemented

2、一條命令安裝所有依賴

可將上述命令整合為一條安裝命令（推薦使用國(guó)內(nèi)鏡像源如-i https://pypi.tuna.tsinghua.edu.cn/simple加速）：

pip install torch moviepy openai-whisper ffmpeg-python opencc-python-reimplemented -i https://pypi.tuna.tsinghua.edu.cn/simple

3、注意事項(xiàng)

首次運(yùn)行時(shí)，Whisper會(huì)自動(dòng)下載選擇的模型（如base模型約1GB），請(qǐng)確保網(wǎng)絡(luò)暢通。
若電腦有NVIDIA顯卡且安裝了CUDA，可將代碼中device="cpu"改為device="cuda"，顯著提升識(shí)別速度（需安裝對(duì)應(yīng)CUDA版本的PyTorch）。

總結(jié)

這款音視頻轉(zhuǎn)字幕工具通過整合moviepy和whisper的強(qiáng)大功能，實(shí)現(xiàn)了文字識(shí)別和字幕生成的自動(dòng)化流程。核心優(yōu)勢(shì)在于：

易用性：可視化界面降低了技術(shù)門檻，無需命令行操作
靈活性：可通過模型選擇平衡速度與精度
實(shí)用性：生成標(biāo)準(zhǔn)SRT格式，直接適配主流視頻編輯軟件

無論是自媒體創(chuàng)作者快速制作字幕，還是學(xué)習(xí)者為教學(xué)視頻添加字幕，這款工具都能顯著提升效率，降低字幕制作的技術(shù)門檻。

以上就是Python結(jié)合moviepy和tkinter自制音視頻轉(zhuǎn)字幕工具的詳細(xì)內(nèi)容，更多關(guān)于Python音視頻轉(zhuǎn)字幕的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕

Python結(jié)合moviepy和tkinter自制音視頻轉(zhuǎn)字幕工具

目錄

前言

一、工具介紹

二、代碼核心部分詳解

1. 音頻提取模塊

2. Whisper模型加載與語(yǔ)音識(shí)別

3. SRT字幕格式化

4. 多線程與進(jìn)度管理

三、完整代碼

四、效果演示

五、第三方庫(kù)安裝

1、需要的庫(kù)

2、一條命令安裝所有依賴

3、注意事項(xiàng)

總結(jié)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线 免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕

Python結(jié)合moviepy和tkinter自制音視頻轉(zhuǎn)字幕工具

目錄

前言

一、工具介紹

二、代碼核心部分詳解

1. 音頻提取模塊

2. Whisper模型加載與語(yǔ)音識(shí)別

3. SRT字幕格式化

4. 多線程與進(jìn)度管理

三、完整代碼

四、效果演示

五、第三方庫(kù)安裝

1、需要的庫(kù)

2、一條命令安裝所有依賴

3、注意事項(xiàng)

總結(jié)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕

五、第三方庫(kù)安裝

1、需要的庫(kù)

3、注意事項(xiàng)