Python+wxPython打造智能網(wǎng)頁截圖工具

更新時間：2025年10月31日 14:42:34 作者：winfredzhang

在網(wǎng)站測試、UI審查或文檔編寫過程中,我們常常需要對網(wǎng)站的所有頁面進行截圖記錄,下面我們就來看看如何使用Python開發(fā)一個自動化工具,實現(xiàn)一鍵遍歷網(wǎng)站所有鏈接并生成帶截圖的Excel報告吧

前言

在網(wǎng)站測試、UI審查或文檔編寫過程中，我們常常需要對網(wǎng)站的所有頁面進行截圖記錄。手動逐頁訪問并截圖不僅效率低下，還容易遺漏。今天，我將分享如何使用Python開發(fā)一個自動化工具，實現(xiàn)一鍵遍歷網(wǎng)站所有鏈接并生成帶截圖的Excel報告。

項目需求

我們的目標是開發(fā)一個桌面應用程序，具備以下功能：

自動提取站內(nèi)鏈接：解析網(wǎng)頁中的所有同域名鏈接
智能點擊操作：首次訪問時自動點擊特定按鈕（如"進入關(guān)懷版"）
完整頁面截圖：截取包括滾動區(qū)域在內(nèi)的整個網(wǎng)頁
生成詳細報告：將鏈接信息和截圖整理成Excel表格

技術(shù)選型

核心技術(shù)棧

wxPython：構(gòu)建圖形用戶界面

Selenium：控制瀏覽器自動化操作

BeautifulSoup：解析HTML提取鏈接

OpenPyXL：生成Excel報告

為什么選擇Selenium

相比傳統(tǒng)的網(wǎng)頁截圖方案，Selenium具有以下優(yōu)勢：

支持完整頁面截圖（包括滾動區(qū)域）
可以執(zhí)行JavaScript交互操作
支持多種瀏覽器（Chrome、Firefox等）
能夠處理動態(tài)加載的內(nèi)容

效果圖

核心功能實現(xiàn)

1. GUI界面設(shè)計

使用wxPython創(chuàng)建簡潔直觀的操作界面：

class WebLinkClickerFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='網(wǎng)頁鏈接自動點擊截圖工具', size=(900, 700))
        
        # URL輸入框
        self.url_input = wx.TextCtrl(panel, value="https://example.com")
        
        # 瀏覽器選擇
        self.browser_choice = wx.Choice(panel, choices=['Chrome', 'Firefox'])
        
        # 功能選項
        self.headless_cb = wx.CheckBox(panel, label="無頭模式(后臺運行)")
        self.full_page_cb = wx.CheckBox(panel, label="完整頁面截圖")
        self.delay_spin = wx.SpinCtrl(panel, value="2", min=1, max=10)
        
        # 進度顯示
        self.progress_text = wx.TextCtrl(panel, style=wx.TE_MULTILINE | wx.TE_READONLY)
        self.progress_gauge = wx.Gauge(panel, range=100)

界面包含以下要素：

URL輸入框和瀏覽器選擇

無頭模式、完整截圖等選項
實時日志顯示和進度條
開始/停止按鈕

2. Selenium瀏覽器初始化

支持Chrome和Firefox兩種瀏覽器，并提供無頭模式選項：

def init_driver(self):
    browser_type = self.browser_choice.GetStringSelection()
    headless = self.headless_cb.GetValue()
    
    if browser_type == 'Chrome':
        options = Options()
        if headless:
            options.add_argument('--headless')
        options.add_argument('--no-sandbox')
        options.add_argument('--window-size=1920,1080')
        
        self.driver = webdriver.Chrome(options=options)
    else:
        options = FirefoxOptions()
        if headless:
            options.add_argument('--headless')
        
        self.driver = webdriver.Firefox(options=options)

關(guān)鍵配置說明：

--headless：后臺運行，不顯示瀏覽器窗口
--no-sandbox：避免權(quán)限問題
--window-size：設(shè)置瀏覽器窗口大小

3. 智能按鈕點擊

實現(xiàn)首次訪問時自動點擊"進入關(guān)懷版"按鈕，使用多種定位策略提高成功率：

def click_care_button(self):
    care_button = None
    
    # 方式1: 通過文本內(nèi)容查找
    try:
        care_button = self.driver.find_element(By.XPATH, "http://*[contains(text(), '進入關(guān)懷版')]")
    except:
        pass
    
    # 方式2: 通過鏈接文本查找
    if not care_button:
        try:
            care_button = self.driver.find_element(By.LINK_TEXT, "進入關(guān)懷版")
        except:
            pass
    
    # 方式3: 通過部分鏈接文本查找
    if not care_button:
        try:
            care_button = self.driver.find_element(By.PARTIAL_LINK_TEXT, "關(guān)懷版")
        except:
            pass
    
    # 方式4: 遍歷所有按鈕
    if not care_button:
        buttons = self.driver.find_elements(By.TAG_NAME, "button")
        for btn in buttons:
            if "關(guān)懷版" in btn.text:
                care_button = btn
                break
    
    # 點擊按鈕
    if care_button:
        care_button.click()
        time.sleep(2)  # 等待頁面跳轉(zhuǎn)

定位策略：

XPath文本內(nèi)容匹配
完整鏈接文本
部分鏈接文本
遍歷button標簽
遍歷a標簽

這種多重策略能夠應對不同網(wǎng)站的HTML結(jié)構(gòu)差異。

4. 站內(nèi)鏈接提取

使用BeautifulSoup解析HTML，篩選出所有站內(nèi)鏈接：

def extract_links(self, url):
    self.driver.get(url)
    time.sleep(self.delay_spin.GetValue())
    
    # 點擊特定按鈕（如果存在）
    self.click_care_button()
    
    # 解析頁面
    html = self.driver.page_source
    soup = BeautifulSoup(html, 'html.parser')
    base_domain = urlparse(url).netloc
    
    # 提取鏈接
    links = []
    for a_tag in soup.find_all('a', href=True):
        href = a_tag['href']
        full_url = urljoin(url, href)
        
        # 只保留站內(nèi)鏈接
        if urlparse(full_url).netloc == base_domain:
            link_text = a_tag.get_text(strip=True) or '無標題'
            links.append({'text': link_text, 'url': full_url})
    
    # 去重
    unique_links = []
    seen_urls = set()
    for link in links:
        if link['url'] not in seen_urls:
            seen_urls.add(link['url'])
            unique_links.append(link)
    
    return unique_links

關(guān)鍵點：

使用urljoin處理相對路徑
通過域名比對篩選站內(nèi)鏈接
自動去重避免重復截圖

5. 完整頁面截圖

這是整個項目的核心功能。使用Chrome的CDP（Chrome DevTools Protocol）實現(xiàn)真正的全頁面截圖：

def take_full_page_screenshot(self, filepath):
    if isinstance(self.driver, webdriver.Chrome):
        # 獲取頁面完整尺寸
        metrics = self.driver.execute_cdp_cmd('Page.getLayoutMetrics', {})
        width = metrics['contentSize']['width']
        height = metrics['contentSize']['height']
        
        # 使用CDP截圖
        screenshot = self.driver.execute_cdp_cmd('Page.captureScreenshot', {
            'clip': {
                'width': width,
                'height': height,
                'x': 0,
                'y': 0,
                'scale': 1
            },
            'captureBeyondViewport': True  # 關(guān)鍵參數(shù)
        })
        
        # 保存截圖
        import base64
        with open(filepath, 'wb') as f:
            f.write(base64.b64decode(screenshot['data']))
    else:
        # Firefox使用標準方法
        self.driver.save_screenshot(filepath)

技術(shù)亮點：

Page.getLayoutMetrics：獲取頁面真實尺寸
captureBeyondViewport: True：允許截取視口外內(nèi)容
自動處理base64編碼

6. 生成Excel報告

使用OpenPyXL將數(shù)據(jù)和截圖整合到Excel文件：

def generate_excel(self):
    wb = openpyxl.Workbook()
    ws = wb.active
    ws.title = "鏈接截圖報告"
    
    # 設(shè)置表頭
    headers = ['序號', '按鈕名稱', '鏈接地址', '截圖']
    ws.append(headers)
    
    # 設(shè)置列寬
    ws.column_dimensions['B'].width = 30
    ws.column_dimensions['C'].width = 50
    ws.column_dimensions['D'].width = 60
    
    # 添加數(shù)據(jù)和截圖
    for data in self.links_data:
        row = ws.max_row + 1
        ws.cell(row, 1, data['index'])
        ws.cell(row, 2, data['text'])
        ws.cell(row, 3, data['url'])
        
        # 插入截圖
        if data['screenshot'] and os.path.exists(data['screenshot']):
            img = XLImage(data['screenshot'])
            img.width = 400
            img.height = 300
            ws.add_image(img, f'D{row}')
            ws.row_dimensions[row].height = 225
    
    # 保存文件
    excel_path = os.path.join(self.screenshot_dir, "鏈接報告.xlsx")
    wb.save(excel_path)

報表特點：

自動調(diào)整列寬和行高
嵌入式截圖，直接查看
包含序號、標題、URL等完整信息

7. 多線程處理

為避免阻塞UI，將耗時操作放在獨立線程中執(zhí)行：

def on_start(self, event):
    self.is_running = True
    thread = threading.Thread(target=self.run_analysis, args=(url,))
    thread.daemon = True
    thread.start()

def run_analysis(self, url):
    try:
        # 初始化瀏覽器
        self.init_driver()
        
        # 提取鏈接
        links = self.extract_links(url)
        
        # 處理每個鏈接
        for index, link in enumerate(links):
            if not self.is_running:  # 支持中途停止
                break
            
            self.process_link(link, index)
            self.update_progress(int((index / len(links)) * 100))
        
        # 生成報告
        self.generate_excel()
    finally:
        self.driver.quit()

線程安全：

使用wx.CallAfter更新UI
設(shè)置daemon=True確保程序能夠正常退出
添加停止標志支持用戶中斷

安裝部署

1. 安裝Python依賴

pip install wxPython selenium beautifulsoup4 openpyxl pillow

2. 安裝瀏覽器驅(qū)動

Chrome驅(qū)動（推薦）：

# 自動管理驅(qū)動
pip install webdriver-manager

或手動下載：ChromeDriver下載

Firefox驅(qū)動：

從 GeckoDriver Releases 下載

3. 運行程序

python web_link_clicker.py

使用指南

基本操作

1.輸入URL：在輸入框中填寫要分析的網(wǎng)站首頁地址

2.選擇瀏覽器：Chrome或Firefox（推薦Chrome）

3.配置選項：

勾選"無頭模式"后臺運行
勾選"完整頁面截圖"捕獲整個頁面
調(diào)整頁面加載延遲（1-10秒）

4.開始分析：點擊"開始分析"按鈕

5.查看結(jié)果：完成后會彈出提示，在生成的文件夾中查看Excel報告

運行流程

輸入URL → 點擊開始 → 初始化瀏覽器 → 打開首頁
↓
點擊"進入關(guān)懷版"（如有）→ 提取所有站內(nèi)鏈接
↓
逐個訪問鏈接 → 完整頁面截圖 → 更新進度
↓
生成Excel報告 → 關(guān)閉瀏覽器 → 完成

輸出內(nèi)容

程序會在工作目錄下生成一個以時間戳命名的文件夾，包含：

所有截圖PNG文件：按序號_鏈接名稱命名
鏈接報告.xlsx：包含序號、按鈕名稱、URL和嵌入截圖的完整報告

功能特色

1. 智能容錯

找不到特定按鈕時自動跳過，不影響后續(xù)流程

截圖失敗時記錄日志但繼續(xù)處理其他鏈接

支持中途停止，已完成的數(shù)據(jù)會被保留

2. 用戶體驗

實時日志顯示當前操作

進度條展示任務完成度

完成后自動彈出文件路徑提示

3. 靈活配置

無頭模式節(jié)省系統(tǒng)資源

可調(diào)節(jié)延遲適應不同網(wǎng)速

支持Chrome和Firefox雙瀏覽器

實際應用場景

1. 網(wǎng)站測試

QA團隊可以使用此工具快速生成網(wǎng)站所有頁面的截圖檔案，便于：

版本對比
UI一致性檢查
問題定位和記錄

2. 文檔編寫

技術(shù)文檔編寫者可以：

自動生成產(chǎn)品界面截圖
制作操作手冊配圖

更新幫助文檔

3. 競品分析

市場人員可以：

快速獲取競品網(wǎng)站截圖
分析頁面布局和功能
制作對比報告

4. 網(wǎng)站歸檔

運維團隊可以：

定期保存網(wǎng)站狀態(tài)
版本發(fā)布前后對比
歷史版本存檔

進階優(yōu)化

1. 添加webdriver-manager

自動管理瀏覽器驅(qū)動版本：

from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service

service = Service(ChromeDriverManager().install())
self.driver = webdriver.Chrome(service=service, options=options)

2. 增加等待策略

使用顯式等待提高穩(wěn)定性：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(self.driver, 10)
element = wait.until(EC.presence_of_element_located((By.ID, "content")))

3. 添加異常處理

更細致的錯誤分類：

try:
    self.driver.get(url)
except TimeoutException:
    self.log("頁面加載超時")
except WebDriverException as e:
    self.log(f"瀏覽器錯誤: {str(e)}")

4. 支持更多格式

除了Excel，還可以生成：

PDF報告
HTML網(wǎng)頁
Markdown文檔

常見問題

Q1: Chrome驅(qū)動版本不匹配

解決方案：

使用webdriver-manager自動管理
或手動下載與Chrome版本匹配的驅(qū)動

Q2: 截圖不完整

確保勾選"完整頁面截圖"，且使用Chrome瀏覽器（Firefox不支持完整截圖）

Q3: 頁面加載太慢

增加"頁面加載延遲"時間，或檢查網(wǎng)絡(luò)連接

Q4: 找不到"進入關(guān)懷版"按鈕

這是正常的，程序會自動跳過此步驟繼續(xù)執(zhí)行

以上就是Python+wxPython打造智能網(wǎng)頁截圖工具的詳細內(nèi)容，更多關(guān)于Python網(wǎng)頁截圖的資料請關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线 免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕