利用OpenCV實(shí)現(xiàn)YOLO對(duì)象檢測(cè)方法詳解

更新時(shí)間：2022年01月10日 08:30:11 作者：AI浩

這篇文章主要介紹了如何使用YOLOV3對(duì)象檢測(cè)器、OpenCV和Python實(shí)現(xiàn)對(duì)圖像和視頻流的檢測(cè)。文中的示例代碼講解詳細(xì)，感興趣的可以了解一下

前言

本文將教你如何使用YOLOV3對(duì)象檢測(cè)器、OpenCV和Python實(shí)現(xiàn)對(duì)圖像和視頻流的檢測(cè)。用到的文件有yolov3.weights、yolov3.cfg、coco.names，這三個(gè)文件的github鏈接如下：

GitHub - pjreddie/darknet: Convolutional Neural Networks

https://pjreddie.com/media/files/yolov3.weights

https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg

https://github.com/pjreddie/darknet/blob/master/data/coco.names

什么是YOLO物體檢測(cè)器？

YOLO 是一個(gè)單級(jí)檢測(cè)器。

Redmon 等人于 2015 年首次提出，他們的論文 You Only Look Once: Unified, Real-Time Object Detection 詳細(xì)介紹了一種能夠進(jìn)行超實(shí)時(shí)物體檢測(cè)的物體檢測(cè)器，在 GPU 上獲得 45 FPS。

YOLO 經(jīng)歷了多次不同的迭代，包括 YOLO9000: Better, Faster, Stronger（即 YOLOv2），能夠檢測(cè)超過 9,000 個(gè)物體檢測(cè)器。

Redmon 和 Farhadi 通過對(duì)對(duì)象檢測(cè)和分類進(jìn)行聯(lián)合訓(xùn)練，能夠?qū)崿F(xiàn)如此大量的對(duì)象檢測(cè)。作者使用聯(lián)合訓(xùn)練同時(shí)在 ImageNet 分類數(shù)據(jù)集和 COCO 檢測(cè)數(shù)據(jù)集上訓(xùn)練了 YOLO9000。

在 COCO 的 156 類版本上，YOLO9000 實(shí)現(xiàn)了 16% 的平均精度 (mAP)，雖然 YOLO 可以檢測(cè) 9,000 個(gè)單獨(dú)的類，但準(zhǔn)確度并不高。

Redmon 和 Farhadi 又發(fā)表了一篇新的 YOLO 論文，YOLOv3: An Incremental Improvement (2018)。 YOLOv3 比以前的模型大得多，但在我看來，它是 YOLO 系列對(duì)象檢測(cè)器中最好的（這也是最后一篇，作者抗議將AI用于軍事領(lǐng)域，告別了CV）。

我們將在這篇博文中使用 YOLOv3權(quán)重，是在 COCO 數(shù)據(jù)集上訓(xùn)練得到的。

COCO數(shù)據(jù)集由80個(gè)標(biāo)簽組成，包括：person，bicycle，car，motorbike，aeroplane，bus，train，truck，boat，traffic light，fire hydrant，stopsign，parking meter，bench，bird，cat，dog，horse，sheep，cow，elephant，bear，zebra，giraffe，backpack，umbrella，handbag，tie，suitcase，frisbee，skis，snowboard，sports ball，kite，baseball bat，baseball glove，skateboard，surfboard，tennis racket，bottle，wine glass，cup，fork，
knife，spoon，bowl，banana，apple，sandwich，orange，broccoli，carrot，hot dog，pizza，donut，cake，chair，sofa，pottedplant，bed，diningtable，toilet，tvmonitor，laptop，mouse，remote，keyboard，cell phone，microwave，oven，toaster，sink，refrigerator，book，clock，vase，scissors，teddy bear，hair drier，toothbrush。

項(xiàng)目結(jié)構(gòu)

在終端中使用 tree 命令查看項(xiàng)目的結(jié)構(gòu)，如下：

我們今天的項(xiàng)目由 4 個(gè)目錄和兩個(gè) Python 腳本組成。

目錄（按重要性排序）是：

yolo-coco文件夾：YOLOv3 物體檢測(cè)器模型文件

images文件夾：存放用于評(píng)估的圖像。

videos文件夾：存放用于評(píng)估的視頻

output: 評(píng)估后的結(jié)果。

yolo.py：評(píng)估圖像

yolo_video.py ：評(píng)估視頻

檢測(cè)圖像

新建文件yolo_objectdetection.py

# import the necessary packages
import numpy as np
import argparse
import time
import cv2
import os

image_path='11.jpg'
yolo='yolo_coco'
confidence_t=0.5
threshold=0.3
# 加載訓(xùn)練 YOLO 模型的 COCO 類標(biāo)簽
labelsPath = os.path.sep.join([yolo, "coco.names"])
LABELS = open(labelsPath).read().strip().split("\n")
# 初始化一個(gè)顏色列表來表示每個(gè)類標(biāo)簽
np.random.seed(42)
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3),
                           dtype="uint8")
# YOLO 對(duì)象檢測(cè)
print("[INFO] loading YOLO from disk...")
config_path = './yolo_coco/yolov3.cfg'
weights_path = './yolo_coco/yolov3.weights'
net = cv2.dnn.readNetFromDarknet(config_path, weights_path)

導(dǎo)入包。

定義全局參數(shù)：

image_path：定義圖片的路徑。
yolo：定義模型存放的路徑
confidence_t：過濾弱檢測(cè)的最小概率。
threshold：非最大值抑制閾值。

接下來，加載了所有的類 LABELS。然后，為每個(gè)標(biāo)簽分配隨機(jī)顏色。

加載權(quán)重文件。

# 加載我們的輸入圖像并獲取其空間維度
image = cv2.imread(image_path)
(H, W) = image.shape[:2]
# 從輸入圖像構(gòu)建一個(gè)blob，然后執(zhí)行一個(gè)前向傳播
# 通過 YOLO 對(duì)象檢測(cè)器，輸出邊界框和相關(guān)概率
blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416),
                             swapRB=True, crop=False)
net.setInput(blob)
start = time.time()
# 獲取網(wǎng)絡(luò)輸出層信息（所有輸出層的名字），設(shè)定并前向傳播
outInfo = net.getUnconnectedOutLayersNames()
# 得到各個(gè)輸出層的、各個(gè)檢測(cè)框等信息，是二維結(jié)構(gòu)。
layerOutputs = net.forward(outInfo)

加載輸入圖像并提取其尺寸。

從 YOLO 模型取出輸出層名稱。

構(gòu)建一個(gè) blob（第 48 和 49 行）。

cv2.dnn.blobFromImage(image[, scalefactor[, size[, mean[, swapRB[, crop[, ddepth]]]]]])

作用：

對(duì)圖像進(jìn)行預(yù)處理，包括減均值，比例縮放，裁剪，交換通道等，返回一個(gè)4通道的blob(blob可以簡(jiǎn)單理解為一個(gè)N維的數(shù)組，用于神經(jīng)網(wǎng)絡(luò)的輸入)

參數(shù)：

image:輸入圖像（1、3或者4通道）

可選參數(shù)

scalefactor:圖像各通道數(shù)值的縮放比例
size:輸出圖像的空間尺寸,如size=(200,300)表示高h(yuǎn)=300,寬w=200
mean:用于各通道減去的值，以降低光照的影響(e.g. image為bgr3通道的圖像，mean=[104.0, 177.0, 123.0],表示b通道的值-104，g-177,r-123)
swapRB:交換RB通道，默認(rèn)為False.(cv2.imread讀取的是彩圖是bgr通道)
crop:圖像裁剪,默認(rèn)為False.當(dāng)值為True時(shí)，先按比例縮放，然后從中心裁剪成size尺寸
ddepth:輸出的圖像深度，可選CV_32F 或者 CV_8U.

通過我們的 YOLO 網(wǎng)絡(luò)執(zhí)行前向傳遞

顯示 YOLO 的推理時(shí)間

接下來我們實(shí)現(xiàn)圖像的可視化操作：

# 分別初始化檢測(cè)到的邊界框、置信度和類 ID 的列表
boxes = []
confidences = []
classIDs = []
# 循環(huán)輸出
for output in layerOutputs:
    # 遍歷每個(gè)檢測(cè)結(jié)果
    for detection in output:
        # 提取物體檢測(cè)的類ID和置信度（即概率）
        scores = detection[5:]
        classID = np.argmax(scores)
        confidence = scores[classID]
        # 過濾精度低的結(jié)果
        if confidence > confidence_t:
            # 延展邊界框坐標(biāo)，計(jì)算 YOLO 邊界框的中心 (x, y) 坐標(biāo)，然后是框的寬度和高度
            box = detection[0:4] * np.array([W, H, W, H])
            (centerX, centerY, width, height) = box.astype("int")
            # 使用中心 (x, y) 坐標(biāo)導(dǎo)出邊界框的上角和左角
            x = int(centerX - (width / 2))
            y = int(centerY - (height / 2))
            # 更新邊界框坐標(biāo)、置信度和類 ID 列表
            boxes.append([x, y, int(width), int(height)])
            confidences.append(float(confidence))
            classIDs.append(classID)
# 使用非極大值抑制來抑制弱的、重疊的邊界框
idxs = cv2.dnn.NMSBoxes(boxes, confidences, confidence_t,
                        threshold)
# 確保至少存在一個(gè)檢測(cè)
if len(idxs) > 0:
    # 遍歷我們保存的索引
    for i in idxs.flatten():
        # 提取邊界框坐標(biāo)
        (x, y) = (boxes[i][0], boxes[i][1])
        (w, h) = (boxes[i][2], boxes[i][3])
        # 在圖像上繪制一個(gè)邊界框矩形和標(biāo)簽
        color = [int(c) for c in COLORS[classIDs[i]]]
        cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
        text = "{}: {:.4f}".format(LABELS[classIDs[i]], confidences[i])
        cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX,
                    0.5, color, 2)
# show the output image
cv2.imshow("Image", image)
cv2.waitKey(0)

初始化列表：

box ：我們圍繞對(duì)象的邊界框。
置信度：YOLO 分配給對(duì)象的置信度值。較低的置信度值表明對(duì)象可能不是網(wǎng)絡(luò)認(rèn)為的那樣。從上面的命令行參數(shù)中記住，我們將過濾掉不滿足 0.5 閾值的對(duì)象。
classIDs ：檢測(cè)到的對(duì)象的類標(biāo)簽。

循環(huán)遍歷每個(gè) layerOutputs。

循環(huán)輸出中的每個(gè)檢測(cè)項(xiàng)。

提取 classID 和置信度。

過濾掉弱檢測(cè)項(xiàng)。

到這里已經(jīng)得到了高精度的檢測(cè)項(xiàng)，然后：

延展邊界框坐標(biāo)，以便可以在原始圖像上正確顯示它們。

提取邊界框的坐標(biāo)和尺寸。 YOLO 以以下形式返回邊界框坐標(biāo)： (centerX, centerY, width, and height) 。

使用此信息計(jì)算出邊界框的左上角 (x, y) 坐標(biāo)。

更新 box 、 confidences 和 classIDs 列表。

然后使用NMS過濾冗余和無關(guān)的邊界框。

接下主要將結(jié)果繪制到圖片上。

運(yùn)行結(jié)果：

檢測(cè)視頻

現(xiàn)在我們已經(jīng)學(xué)習(xí)了如何將 YOLO 對(duì)象檢測(cè)器應(yīng)用于單個(gè)圖像，接下來嘗試檢測(cè)視頻或者攝像頭中的物體。

新建 yolo_video.py 文件并插入以下代碼：

import numpy as np
import imutils
import time
import cv2
import os

yolo = 'yolo_coco'
confidence_t = 0.5
threshold = 0.3
output = 'output.avi'

導(dǎo)入需要的包

定義全局參數(shù)：

yolo：定義模型存放的路徑

confidence_t：過濾弱檢測(cè)的最小概率。

threshold：非最大值抑制閾值。

output：輸出的視頻結(jié)果

# 加載YOLO 模型訓(xùn)練的 COCO 類標(biāo)簽
labelsPath = os.path.sep.join([yolo, "coco.names"])
LABELS = open(labelsPath).read().strip().split("\n")
# 初始化顏色列表
np.random.seed(42)
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3), dtype="uint8")
# 配置 YOLO 權(quán)重和模型配置的路徑
weightsPath = os.path.sep.join([yolo, "yolov3.weights"])
configPath = os.path.sep.join([yolo, "yolov3.cfg"])
# 加載在 COCO 數(shù)據(jù)集（80 個(gè)類）上訓(xùn)練的 YOLO 對(duì)象檢測(cè)，并獲取YOLO輸出層的名稱
net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
# 獲取網(wǎng)絡(luò)輸出層信息（所有輸出層的名字），設(shè)定并前向傳播
outInfo = net.getUnconnectedOutLayersNames()
# 初始化視頻流、指向輸出視頻文件的指針和幀尺寸
vs = cv2.VideoCapture(0)
writer = None
(W, H) = (None, None)
# 獲取文件的總幀數(shù)。
try:
    prop = cv2.cv.CV_CAP_PROP_FRAME_COUNT if imutils.is_cv2() \
        else cv2.CAP_PROP_FRAME_COUNT
    total = int(vs.get(prop))
    print("[INFO] {} total frames in video".format(total))
except:
    print("[INFO] could not determine # of frames in video")
    print("[INFO] no approx. completion time can be provided")
    total = -1

這段代碼的步驟：

讀取類別。

給每個(gè)類別初始化顏色。

設(shè)置YOLO權(quán)重文件的路徑。

加載YOLO權(quán)重文件。

獲取輸出層信息。

初始化VideoCapture對(duì)象。

初始化視頻編寫器和幀尺寸。

獲取總幀數(shù)，以便估計(jì)處理整個(gè)視頻需要多長(zhǎng)時(shí)間。

# loop over frames from the video file stream
while True:
    # 從文件中讀取下一幀
    (grabbed, frame) = vs.read()
    # 如果幀沒有被抓取，那么已經(jīng)到了流的末尾
    if not grabbed:
        break
    # 如果框架尺寸為空，則給他們賦值
    if W is None or H is None:
        (H, W) = frame.shape[:2]
    # 從輸入幀構(gòu)造一個(gè) blob，然后執(zhí)行 YOLO 對(duì)象檢測(cè)器的前向傳遞，得到邊界框和相關(guān)概率
    blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),
                                 swapRB=True, crop=False)
    net.setInput(blob)
    start = time.time()
    layerOutputs = net.forward(outInfo)
    end = time.time()
   # 分別初始化檢測(cè)到的邊界框、置信度和類 ID 的列表
    boxes = []
    confidences = []
    classIDs = []
    # 循環(huán)輸出
    for output in layerOutputs:
        # 遍歷每個(gè)檢測(cè)結(jié)果
        for detection in output:
            # 提取物體檢測(cè)的類ID和置信度（即概率）
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]
             # 過濾精度低的結(jié)果
            if confidence > confidence_t:
               # 縮放邊界框坐標(biāo)，計(jì)算 YOLO 邊界框的中心 (x, y) 坐標(biāo)，然后是框的寬度和高度
                box = detection[0:4] * np.array([W, H, W, H])
                (centerX, centerY, width, height) = box.astype("int")
                # 使用中心 (x, y) 坐標(biāo)導(dǎo)出邊界框的上角和左角
                x = int(centerX - (width / 2))
                y = int(centerY - (height / 2))
               # 更新邊界框坐標(biāo)、置信度和類 ID 列表
                boxes.append([x, y, int(width), int(height)])
                confidences.append(float(confidence))
                classIDs.append(classID)
    # 使用非極大值抑制來抑制弱的、重疊的邊界框
    idxs = cv2.dnn.NMSBoxes(boxes, confidences, confidence_t,
                            threshold)
    # 確保至少存在一個(gè)檢測(cè)
    if len(idxs) > 0:
        # 遍歷保存的索引
        for i in idxs.flatten():
            # 在圖像上繪制一個(gè)邊界框矩形和標(biāo)簽
            (x, y) = (boxes[i][0], boxes[i][1])
            (w, h) = (boxes[i][2], boxes[i][3])
           # 確保至少存在一個(gè)檢測(cè)
            color = [int(c) for c in COLORS[classIDs[i]]]
            cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
            text = "{}: {:.4f}".format(LABELS[classIDs[i]],
                                       confidences[i])
            cv2.putText(frame, text, (x, y - 5),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF
    # check if the video writer is None
    if writer is None:
        # initialize our video writer
        fourcc = cv2.VideoWriter_fourcc(*'XVID')
        writer = cv2.VideoWriter('output.avi', fourcc, 30, (int(frame.shape[1]), int(frame.shape[0])))
        # some information on processing single frame
        if total > 0:
            elap = (end - start)
            print("[INFO] single frame took {:.4f} seconds".format(elap))
            print("[INFO] estimated total time to finish: {:.4f}".format(
                elap * total))
    # write the output frame to disk
    writer.write(frame)
# release the file pointers
print("[INFO] cleaning up...")
writer.release()
vs.release()

定義了一個(gè) while 循環(huán)，然后抓取第一幀。

檢查它是否是視頻的最后一幀。如果是，我們需要中斷 while 循環(huán)。

如果框架尺寸為None，則給他們賦值。

構(gòu)建一個(gè) blob 并將其通過網(wǎng)絡(luò)，獲得預(yù)測(cè)。