使用Python和OpenCV實(shí)現(xiàn)實(shí)時(shí)文檔掃描與矯正系統(tǒng)

更新時(shí)間：2025年05月19日 08:29:29 作者：知舟不敘

在日常工作和學(xué)習(xí)中,我們經(jīng)常需要將紙質(zhì)文檔數(shù)字化,手動(dòng)拍攝文檔照片常常會(huì)出現(xiàn)角度傾斜、透?視變形等問題,影響后續(xù)使用,本文將介紹如何使用Python和OpenCV構(gòu)建一個(gè)實(shí)時(shí)文檔掃描與矯正系統(tǒng),能夠通過攝像頭自動(dòng)檢測(cè)文檔邊緣并進(jìn)行透?視變換矯正,需要的朋友可以參考下

一、系統(tǒng)概述

該系統(tǒng)主要實(shí)現(xiàn)以下功能：

實(shí)時(shí)攝像頭捕獲圖像
邊緣檢測(cè)和輪廓查找
文檔輪廓識(shí)別
透視變換矯正文檔
二值化處理增強(qiáng)可讀性

二、核心代碼解析

1. 導(dǎo)入必要庫(kù)

import numpy as np
import cv2

我們主要使用NumPy進(jìn)行數(shù)值計(jì)算，OpenCV進(jìn)行圖像處理。

2. 輔助函數(shù)定義

首先定義了一個(gè)簡(jiǎn)單的圖像顯示函數(shù)，方便調(diào)試：

def cv_show(name,img):
    cv2.imshow(name,img)
    cv2.waitKey(10)

3. 坐標(biāo)點(diǎn)排序函數(shù)

order_points函數(shù)用于將檢測(cè)到的文檔四個(gè)角點(diǎn)按順序排列（左上、右上、右下、左下）：

def order_points(pts):
    rect = np.zeros((4,2),dtype="float32")
    s = pts.sum(axis=1)
    rect[0] = pts[np.argmin(s)]  # 左上點(diǎn)(x+y最小)
    rect[2] = pts[np.argmax(s)]  # 右下點(diǎn)(x+y最大)
    diff = np.diff(pts,axis=1)
    rect[1] = pts[np.argmin(diff)]  # 右上點(diǎn)(y-x最小)
    rect[3] = pts[np.argmax(diff)]  # 左下點(diǎn)(y-x最大)
    return rect

這個(gè)函數(shù)的作用是對(duì)給定的4個(gè)二維坐標(biāo)點(diǎn)進(jìn)行排序，使其按照左上、右上、右下、左下的順序排列。這在文檔掃描、圖像矯正等應(yīng)用中非常重要，因?yàn)槲覀冃枰烂總€(gè)角點(diǎn)的確切位置才能正確地進(jìn)行透視變換。

函數(shù)詳細(xì)解析

（1）排序邏輯說明

左上點(diǎn)(rect[0])：選擇x+y值最小的點(diǎn)
- 因?yàn)樽笊辖窃谧鴺?biāo)系中 x 和 y 值都較小，相加結(jié)果最小
右下點(diǎn)(rect[2])：選擇x+y值最大的點(diǎn)
- 因?yàn)橛蚁陆窃谧鴺?biāo)系中 x 和 y 值都較大，相加結(jié)果最大
右上點(diǎn)(rect[1])：選擇y-x值最小的點(diǎn)
- 右上角的特點(diǎn)是 y 相對(duì)較小而 x 相對(duì)較大，所以 y-x 值最小
左下點(diǎn)(rect[3])：選擇y-x值最大的點(diǎn)
- 左下角的特點(diǎn)是 y 相對(duì)較大而 x 相對(duì)較小，所以 y-x 值最大

（2）示例

假設(shè)有4個(gè)點(diǎn)：

	A(10, 20)  # 假設(shè)是左上
	B(50, 20)  # 右上
	C(50, 60)  # 右下
	D(10, 60)  # 左下

計(jì)算過程：

x+y值：[30, 70, 110, 70]
- 最小30 → A(左上)
- 最大110 → C(右下)
y-x值：[10, -30, 10, 50]
- 最小-30 → B(右上)
- 最大50 → D(左下)

最終排序結(jié)果：[A, B, C, D] 即 [左上, 右上, 右下, 左下]

（3）為什么這種方法有效

這種方法利用了二維坐標(biāo)點(diǎn)的幾何特性：

在標(biāo)準(zhǔn)坐標(biāo)系中，左上角的x和y值都較小
右下角的x和y值都較大
右上角的x較大而y較小
左下角的x較小而y較大

通過簡(jiǎn)單的加減運(yùn)算就能可靠地區(qū)分出各個(gè)角點(diǎn)，不需要復(fù)雜的幾何計(jì)算。

4. 透視變換函數(shù)

four_point_transform函數(shù)實(shí)現(xiàn)了文檔矯正的核心功能：

def four_point_transform(image,pts):
    rect = order_points(pts)
    (tl,tr,br,bl) = rect
    
    # 計(jì)算變換后的寬度和高度
    widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    maxWidth = max(int(widthA),int(widthB))
    
    heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    maxHeight = max(int(heightA),int(heightB))
    
    # 定義目標(biāo)圖像坐標(biāo)
    dst = np.array([[0,0],[maxWidth - 1,0],
                    [maxWidth - 1,maxHeight - 1],[0,maxHeight - 1]],dtype="float32")

    # 計(jì)算透視變換矩陣并應(yīng)用
    M = cv2.getPerspectiveTransform(rect,dst)
    warped = cv2.warpPerspective(image,M,(maxWidth,maxHeight))
    
    return warped

這個(gè)函數(shù)實(shí)現(xiàn)了透視變換(Perspective Transformation)，用于將圖像中的任意四邊形區(qū)域矯正為一個(gè)矩形（即"去透視"效果）。

函數(shù)詳細(xì)解析

輸入?yún)?shù)

def four_point_transform(image, pts):

image: 原始圖像
pts: 包含4個(gè)點(diǎn)的數(shù)組，表示要轉(zhuǎn)換的四邊形區(qū)域

坐標(biāo)點(diǎn)排序

rect = order_points(pts)
(tl, tr, br, bl) = rect  # 分解為左上(top-left)、右上(top-right)、右下(bottom-right)、左下(bottom-left)

使用之前介紹的order_points函數(shù)將4個(gè)點(diǎn)按順序排列

計(jì)算輸出圖像的寬度

widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))  # 底邊長(zhǎng)度
widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))  # 頂邊長(zhǎng)度
maxWidth = max(int(widthA), int(widthB))  # 取最大值作為輸出圖像寬度

計(jì)算四邊形底部和頂部的邊長(zhǎng)，選擇較長(zhǎng)的作為輸出寬度

計(jì)算輸出圖像的高度

heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))  # 右邊高度
heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))  # 左邊高度
maxHeight = max(int(heightA), int(heightB))  # 取最大值作為輸出圖像高度

計(jì)算四邊形右側(cè)和左側(cè)的邊長(zhǎng)，選擇較長(zhǎng)的作為輸出高度

定義目標(biāo)矩形坐標(biāo)

dst = np.array([
    [0, 0],  # 左上
    [maxWidth - 1, 0],  # 右上
    [maxWidth - 1, maxHeight - 1],  # 右下
    [0, maxHeight - 1]  # 左下
], dtype="float32")

定義變換后的矩形角點(diǎn)坐標(biāo)（從(0,0)開始的正矩形）

計(jì)算透視變換矩陣并應(yīng)用

M = cv2.getPerspectiveTransform(rect, dst)  # 計(jì)算變換矩陣
warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))  # 應(yīng)用變換

getPerspectiveTransform: 計(jì)算從原始四邊形到目標(biāo)矩形的3x3變換矩陣
warpPerspective: 應(yīng)用這個(gè)變換矩陣到原始圖像

返回結(jié)果

return warped

返回矯正后的矩形圖像

透視變換原理圖示

原始圖像中的四邊形               變換后的矩形
   tl--------tr                    0--------maxWidth
    \        /                      |        |
     \      /                       |        |
      bl----br                       maxHeight

為什么需要這樣計(jì)算寬度和高度？

取最大值的原因：

原始四邊形可能有透視變形，兩條對(duì)邊長(zhǎng)度可能不等
選擇較大的值可以確保所有內(nèi)容都能包含在輸出圖像中

減1的原因：

圖像坐標(biāo)從0開始，所以寬度為maxWidth的圖像，最大x坐標(biāo)是maxWidth-1

5. 主程序流程

主程序?qū)崿F(xiàn)了實(shí)時(shí)文檔檢測(cè)和矯正的完整流程：

初始化攝像頭

cap = cv2.VideoCapture(0)
if not cap.isOpened():
    print("Cannot open camera")
    exit()

實(shí)時(shí)處理循環(huán)

while True:
    flag = 0
    ret,image = cap.read()
    orig = image.copy()
    if not ret:
        print("不能讀取攝像頭")
        break

圖像預(yù)處理

gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray,(5,5),0)  # 高斯濾波降噪
edged = cv2.Canny(gray,75,200)  # Canny邊緣檢測(cè)

輪廓檢測(cè)與篩選

cnts = cv2.findContours(edged,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[-2]
cnts = sorted(cnts,key=cv2.contourArea,reverse=True)[:3]  # 取面積最大的3個(gè)輪廓

for c in cnts:
    peri = cv2.arcLength(c,True)  # 計(jì)算輪廓周長(zhǎng)
    approx = cv2.approxPolyDP(c,0.05 * peri,True)  # 多邊形近似
    area = cv2.contourArea(approx)
    
    # 篩選四邊形且面積足夠大的輪廓
    if area > 20000 and len(approx) == 4:
        screenCnt = approx
        flag = 1
        break

文檔矯正與顯示

if flag == 1:
    # 繪制輪廓
    image_contours = cv2.drawContours(image,[screenCnt],0,(0,255,0),2)
    
    # 透視變換
    warped = four_point_transform(orig,screenCnt.reshape(4,2))
    
    # 二值化處理
    warped = cv2.cvtColor(warped,cv2.COLOR_BGR2GRAY)
    ref = cv2.threshold(warped,0,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

三、完整代碼

# 導(dǎo)入工具包
import numpy as np
import cv2

def cv_show(name,img):
    cv2.imshow(name,img)
    cv2.waitKey(10)
def order_points(pts):
    # 一共4個(gè)坐標(biāo)點(diǎn)
    rect = np.zeros((4,2),dtype="float32") # 用來存儲(chǔ)排序之后的坐標(biāo)位置
    # 按順序找到對(duì)應(yīng)坐標(biāo)0123分別是 左上、右上、右下、左下
    s = pts.sum(axis=1) #對(duì)pts矩陣的每一行進(jìn)行求和操作，（x+y）
    rect[0] = pts[np.argmin(s)]
    rect[2] = pts[np.argmax(s)]
    diff = np.diff(pts,axis=1) #對(duì)pts矩陣的每一行進(jìn)行求差操作，（y-x）
    rect[1] = pts[np.argmin(diff)]
    rect[3] = pts[np.argmax(diff)]
    return rect

def four_point_transform(image,pts):
    # 獲取輸入坐標(biāo)點(diǎn)
    rect = order_points(pts)
    (tl,tr,br,bl) = rect
    # 計(jì)算輸入的w和h值
    widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    maxWidth = max(int(widthA),int(widthB))
    heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    maxHeight = max(int(heightA),int(heightB))
    # 變換后對(duì)應(yīng)坐標(biāo)位置
    dst = np.array([[0,0],[maxWidth - 1,0],
                    [maxWidth - 1,maxHeight - 1],[0,maxHeight - 1]],dtype="float32")

    M = cv2.getPerspectiveTransform(rect,dst)
    warped = cv2.warpPerspective(image,M,(maxWidth,maxHeight))
    # 返回變換后的結(jié)果
    return warped


# 讀取輸入
import cv2
cap = cv2.VideoCapture(0)  # 確保攝像頭是可以啟動(dòng)的狀態(tài)
if not cap.isOpened():   #打開失敗
    print("Cannot open camera")
    exit()

while True:
    flag = 0 # 用于標(biāo)時(shí) 當(dāng)前是否檢測(cè)到文檔
    ret,image = cap.read()  # 如果正確讀取幀，ret為True
    orig = image.copy()
    if not ret: #讀取失敗，則退出循環(huán)
        print("不能讀取攝像頭")
        break
    cv_show("image",image)

    gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
    # 預(yù)處理
    gray = cv2.GaussianBlur(gray,(5,5),0) # 高斯濾波
    edged = cv2.Canny(gray,75,200)
    cv_show('1',edged)

    # 輪廓檢測(cè)
    cnts = cv2.findContours(edged,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[-2]

    cnts = sorted(cnts,key=cv2.contourArea,reverse=True)[:3]
    image_contours = cv2.drawContours(image,cnts,-1,(0,255,0),2)
    cv_show("image_contours",image_contours)
    # 遍歷輪廓
    for c in cnts:
        # 計(jì)算輪廓近似
        peri = cv2.arcLength(c,True) # 計(jì)算輪廓的周長(zhǎng)
        # C 表示輸入的點(diǎn)集
        # epsilon表示從原始輪廓到近似輪廓的最大距離，它是一個(gè)準(zhǔn)確度參數(shù)
        # True表示封閉的
        approx = cv2.approxPolyDP(c,0.05 * peri,True) # 輪廓近似
        area = cv2.contourArea(approx)
        # 4個(gè)點(diǎn)的時(shí)候就拿出來
        if area > 20000 and len(approx) == 4:
            screenCnt = approx
            flag = 1
            print(peri,area)
            print("檢測(cè)到文檔")
            break
    if flag == 1:
        # 展示結(jié)果
        # print("STEP 2: 獲取輪廓")
        image_contours = cv2.drawContours(image,[screenCnt],0,(0,255,0),2)
        cv_show("image",image_contours)
        # 透視變換
        warped = four_point_transform(orig,screenCnt.reshape(4,2))
        cv_show("warped",warped)
        # 二值處理
        warped = cv2.cvtColor(warped,cv2.COLOR_BGR2GRAY)
        # ref = cv2.threshold(warped,220,255,cv2.THRESH_BINARY)[1]
        ref = cv2.threshold(warped,0,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
        cv_show("ref",ref)
cap.release() # 釋放捕捉器
cv2.destroyAllWindows() #關(guān)閉圖像窗口

四、結(jié)語(yǔ)

本文介紹了一個(gè)基于OpenCV的實(shí)時(shí)文檔掃描與矯正系統(tǒng)，通過邊緣檢測(cè)、輪廓分析和透視變換等技術(shù)，實(shí)現(xiàn)了文檔的自動(dòng)檢測(cè)和矯正。該系統(tǒng)可以方便地應(yīng)用于日常文檔數(shù)字化工作，提高工作效率。

完整代碼已在上文中給出，讀者可以根據(jù)自己的需求進(jìn)行修改和擴(kuò)展。OpenCV提供了強(qiáng)大的圖像處理能力，結(jié)合Python的簡(jiǎn)潔語(yǔ)法，使得開發(fā)這樣的實(shí)用系統(tǒng)變得簡(jiǎn)單高效。

以上就是使用Python和OpenCV實(shí)現(xiàn)實(shí)時(shí)文檔掃描與矯正系統(tǒng)的詳細(xì)內(nèi)容，更多關(guān)于Python OpenCV文檔掃描與矯正的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕

使用Python和OpenCV實(shí)現(xiàn)實(shí)時(shí)文檔掃描與矯正系統(tǒng)

目錄

一、系統(tǒng)概述

二、核心代碼解析

1. 導(dǎo)入必要庫(kù)

2. 輔助函數(shù)定義

3. 坐標(biāo)點(diǎn)排序函數(shù)

4. 透視變換函數(shù)

5. 主程序流程

三、完整代碼

四、結(jié)語(yǔ)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线 免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕

使用Python和OpenCV實(shí)現(xiàn)實(shí)時(shí)文檔掃描與矯正系統(tǒng)

目錄

一、系統(tǒng)概述

二、核心代碼解析

1. 導(dǎo)入必要庫(kù)

2. 輔助函數(shù)定義

3. 坐標(biāo)點(diǎn)排序函數(shù)

4. 透視變換函數(shù)

5. 主程序流程

三、完整代碼

四、結(jié)語(yǔ)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕

二、核心代碼解析

四、結(jié)語(yǔ)