python目標(biāo)檢測(cè)非極大抑制NMS與Soft-NMS

更新時(shí)間：2022年05月09日 14:40:32 作者：Bubbliiiing

這篇文章主要weidajia?介紹了python目標(biāo)檢測(cè)非極大抑制NMS與Soft-NMS實(shí)現(xiàn)過(guò)程，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進(jìn)步，早日升職加薪

注意事項(xiàng)

Soft-NMS對(duì)于大多數(shù)數(shù)據(jù)集而言，作用比較小，提升效果非常不明顯，它起作用的地方是大量密集的同類重疊場(chǎng)景，大量密集的不同類重疊場(chǎng)景其實(shí)也沒(méi)什么作用，同學(xué)們可以借助Soft-NMS理解非極大抑制的含義，但是實(shí)現(xiàn)的必要性確實(shí)不強(qiáng)，在提升網(wǎng)絡(luò)性能上，不建議死磕Soft-NMS。

已對(duì)該博文中的代碼進(jìn)行了重置，視頻中實(shí)現(xiàn)的代碼是numpy形式，而且?guī)毂容^久遠(yuǎn)。這里改成pytorch的形式，且適應(yīng)當(dāng)前的庫(kù)。

非極大抑制是目標(biāo)檢測(cè)中非常非常非常非常非常重要的一部分，了解一下原理，撕一下代碼是必要的

什么是非極大抑制NMS

非極大抑制的概念只需要看這兩幅圖就知道了：

下圖是經(jīng)過(guò)非極大抑制的。

下圖是未經(jīng)過(guò)非極大抑制的。

可以很明顯的看出來(lái)，未經(jīng)過(guò)非極大抑制的圖片有許多重復(fù)的框，這些框都指向了同一個(gè)物體！

可以用一句話概括非極大抑制的功能就是：

篩選出一定區(qū)域內(nèi)屬于同一種類得分最大的框。

1、非極大抑制NMS的實(shí)現(xiàn)過(guò)程

本博文實(shí)現(xiàn)的是多分類的非極大抑制，該非極大抑制使用在我的pytorch-yolov3例子中：輸入shape為[ batch_size, all_anchors, 5+num_classes ]

第一個(gè)維度是圖片的數(shù)量。

第二個(gè)維度是所有的預(yù)測(cè)框。

第三個(gè)維度是所有的預(yù)測(cè)框的預(yù)測(cè)結(jié)果。

非極大抑制的執(zhí)行過(guò)程如下所示：

1、對(duì)所有圖片進(jìn)行循環(huán)。

2、找出該圖片中得分大于門限函數(shù)的框。在進(jìn)行重合框篩選前就進(jìn)行得分的篩選可以大幅度減少框的數(shù)量。

3、判斷第2步中獲得的框的種類與得分。取出預(yù)測(cè)結(jié)果中框的位置與之進(jìn)行堆疊。此時(shí)最后一維度里面的內(nèi)容由5+num_classes變成了4+1+2，四個(gè)參數(shù)代表框的位置，一個(gè)參數(shù)代表預(yù)測(cè)框是否包含物體，兩個(gè)參數(shù)分別代表種類的置信度與種類。

4、對(duì)種類進(jìn)行循環(huán)，非極大抑制的作用是篩選出一定區(qū)域內(nèi)屬于同一種類得分最大的框，對(duì)種類進(jìn)行循環(huán)可以幫助我們對(duì)每一個(gè)類分別進(jìn)行非極大抑制。

5、根據(jù)得分對(duì)該種類進(jìn)行從大到小排序。

6、每次取出得分最大的框，計(jì)算其與其它所有預(yù)測(cè)框的重合程度，重合程度過(guò)大的則剔除。

視頻中實(shí)現(xiàn)的代碼是numpy形式，而且?guī)毂容^久遠(yuǎn)。這里改成pytorch的形式，且適應(yīng)當(dāng)前的庫(kù)。

實(shí)現(xiàn)代碼如下：

def bbox_iou(self, box1, box2, x1y1x2y2=True):
    """
        計(jì)算IOU
    """
    if not x1y1x2y2:
        b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
        b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
        b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
        b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
    else:
        b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
        b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]
    inter_rect_x1 = torch.max(b1_x1, b2_x1)
    inter_rect_y1 = torch.max(b1_y1, b2_y1)
    inter_rect_x2 = torch.min(b1_x2, b2_x2)
    inter_rect_y2 = torch.min(b1_y2, b2_y2)
    inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1, min=0) * \
                torch.clamp(inter_rect_y2 - inter_rect_y1, min=0)
    b1_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1)
    b2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1)
    iou = inter_area / torch.clamp(b1_area + b2_area - inter_area, min = 1e-6)
    return iou
def non_max_suppression(self, prediction, num_classes, input_shape, image_shape, letterbox_image, conf_thres=0.5, nms_thres=0.4):
    #----------------------------------------------------------#
    #   將預(yù)測(cè)結(jié)果的格式轉(zhuǎn)換成左上角右下角的格式。
    #   prediction  [batch_size, num_anchors, 85]
    #----------------------------------------------------------#
    box_corner          = prediction.new(prediction.shape)
    box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
    box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
    box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
    box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
    prediction[:, :, :4] = box_corner[:, :, :4]
    output = [None for _ in range(len(prediction))]
    for i, image_pred in enumerate(prediction):
        #----------------------------------------------------------#
        #   對(duì)種類預(yù)測(cè)部分取max。
        #   class_conf  [num_anchors, 1]    種類置信度
        #   class_pred  [num_anchors, 1]    種類
        #----------------------------------------------------------#
        class_conf, class_pred = torch.max(image_pred[:, 5:5 + num_classes], 1, keepdim=True)
        #----------------------------------------------------------#
        #   利用置信度進(jìn)行第一輪篩選
        #----------------------------------------------------------#
        conf_mask = (image_pred[:, 4] * class_conf[:, 0] >= conf_thres).squeeze()
        #----------------------------------------------------------#
        #   根據(jù)置信度進(jìn)行預(yù)測(cè)結(jié)果的篩選
        #----------------------------------------------------------#
        image_pred = image_pred[conf_mask]
        class_conf = class_conf[conf_mask]
        class_pred = class_pred[conf_mask]
        if not image_pred.size(0):
            continue
        #-------------------------------------------------------------------------#
        #   detections  [num_anchors, 7]
        #   7的內(nèi)容為：x1, y1, x2, y2, obj_conf, class_conf, class_pred
        #-------------------------------------------------------------------------#
        detections = torch.cat((image_pred[:, :5], class_conf.float(), class_pred.float()), 1)
        #------------------------------------------#
        #   獲得預(yù)測(cè)結(jié)果中包含的所有種類
        #------------------------------------------#
        unique_labels = detections[:, -1].cpu().unique()
        if prediction.is_cuda:
            unique_labels = unique_labels.cuda()
            detections = detections.cuda()
        for c in unique_labels:
            #------------------------------------------#
            #   獲得某一類得分篩選后全部的預(yù)測(cè)結(jié)果
            #------------------------------------------#
            detections_class = detections[detections[:, -1] == c]
            # #------------------------------------------#
            # #   使用官方自帶的非極大抑制會(huì)速度更快一些！
            # #------------------------------------------#
            # keep = nms(
            #     detections_class[:, :4],
            #     detections_class[:, 4] * detections_class[:, 5],
            #     nms_thres
            # )
            # max_detections = detections_class[keep]
            # 按照存在物體的置信度排序
            _, conf_sort_index = torch.sort(detections_class[:, 4]*detections_class[:, 5], descending=True)
            detections_class = detections_class[conf_sort_index]
            # 進(jìn)行非極大抑制
            max_detections = []
            while detections_class.size(0):
                # 取出這一類置信度最高的，一步一步往下判斷，判斷重合程度是否大于nms_thres，如果是則去除掉
                max_detections.append(detections_class[0].unsqueeze(0))
                if len(detections_class) == 1:
                    break
                ious = self.bbox_iou(max_detections[-1], detections_class[1:])
                detections_class = detections_class[1:][ious < nms_thres]
            # 堆疊
            max_detections = torch.cat(max_detections).data
            # Add max detections to outputs
            output[i] = max_detections if output[i] is None else torch.cat((output[i], max_detections))
        if output[i] is not None:
            output[i]           = output[i].cpu().numpy()
            box_xy, box_wh      = (output[i][:, 0:2] + output[i][:, 2:4])/2, output[i][:, 2:4] - output[i][:, 0:2]
            output[i][:, :4]    = self.yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image)
    return output

2、柔性非極大抑制Soft-NMS的實(shí)現(xiàn)過(guò)程

柔性非極大抑制和普通的非極大抑制相差不大，只差了幾行代碼。

柔性非極大抑制認(rèn)為不應(yīng)該直接只通過(guò)重合程度進(jìn)行篩選，如圖所示，很明顯圖片中存在兩匹馬，但是此時(shí)兩匹馬的重合程度較高，此時(shí)我們?nèi)绻褂闷胀╪ms，后面那匹得分比較低的馬會(huì)直接被剔除。

Soft-NMS認(rèn)為在進(jìn)行非極大抑制的時(shí)候要同時(shí)考慮得分和重合程度。

我們直接看NMS和Soft-NMS的代碼差別：

視頻中實(shí)現(xiàn)的代碼是numpy形式，而且?guī)毂容^久遠(yuǎn)。這里改成pytorch的形式，且適應(yīng)當(dāng)前的庫(kù)。

如下為NMS：

while detections_class.size(0):
    # 取出這一類置信度最高的，一步一步往下判斷，判斷重合程度是否大于nms_thres，如果是則去除掉
    max_detections.append(detections_class[0].unsqueeze(0))
    if len(detections_class) == 1:
        break
    ious = self.bbox_iou(max_detections[-1], detections_class[1:])
    detections_class = detections_class[1:][ious < nms_thres]

如下為Soft-NMS：

while detections_class.size(0):
    # 取出這一類置信度最高的，一步一步往下判斷，判斷重合程度是否大于nms_thres，如果是則去除掉
    max_detections.append(detections_class[0].unsqueeze(0))
    if len(detections_class) == 1:
        break
    ious                    = self.bbox_iou(max_detections[-1], detections_class[1:])
    detections_class[1:, 4] = torch.exp(-(ious * ious) / sigma) * detections_class[1:, 4]
    detections_class        = detections_class[1:]
    detections_class        = detections_class[detections_class[:, 4] >= conf_thres]
    arg_sort                = torch.argsort(detections_class[:, 4], descending = True)
    detections_class        = detections_class[arg_sort]

我們可以看到，對(duì)于NMS而言，其直接將與得分最大的框重合程度較高的其它預(yù)測(cè)剔除。而Soft-NMS則以一個(gè)權(quán)重的形式，將獲得的IOU取高斯指數(shù)后乘上原得分，之后重新排序。繼續(xù)循環(huán)。

視頻中實(shí)現(xiàn)的代碼是numpy形式，而且?guī)毂容^久遠(yuǎn)。這里改成pytorch的形式，且適應(yīng)當(dāng)前的庫(kù)。

實(shí)現(xiàn)代碼如下：

def bbox_iou(self, box1, box2, x1y1x2y2=True):
    """
        計(jì)算IOU
    """
    if not x1y1x2y2:
        b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
        b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
        b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
        b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
    else:
        b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
        b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]
    inter_rect_x1 = torch.max(b1_x1, b2_x1)
    inter_rect_y1 = torch.max(b1_y1, b2_y1)
    inter_rect_x2 = torch.min(b1_x2, b2_x2)
    inter_rect_y2 = torch.min(b1_y2, b2_y2)
    inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1, min=0) * \
                torch.clamp(inter_rect_y2 - inter_rect_y1, min=0)
    b1_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1)
    b2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1)
    iou = inter_area / torch.clamp(b1_area + b2_area - inter_area, min = 1e-6)
    return iou
def non_max_suppression(self, prediction, num_classes, input_shape, image_shape, letterbox_image, conf_thres=0.5, nms_thres=0.4, sigma=0.5):
    #----------------------------------------------------------#
    #   將預(yù)測(cè)結(jié)果的格式轉(zhuǎn)換成左上角右下角的格式。
    #   prediction  [batch_size, num_anchors, 85]
    #----------------------------------------------------------#
    box_corner          = prediction.new(prediction.shape)
    box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
    box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
    box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
    box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
    prediction[:, :, :4] = box_corner[:, :, :4]
    output = [None for _ in range(len(prediction))]
    for i, image_pred in enumerate(prediction):
        #----------------------------------------------------------#
        #   對(duì)種類預(yù)測(cè)部分取max。
        #   class_conf  [num_anchors, 1]    種類置信度
        #   class_pred  [num_anchors, 1]    種類
        #----------------------------------------------------------#
        class_conf, class_pred = torch.max(image_pred[:, 5:5 + num_classes], 1, keepdim=True)
        #----------------------------------------------------------#
        #   利用置信度進(jìn)行第一輪篩選
        #----------------------------------------------------------#
        conf_mask = (image_pred[:, 4] * class_conf[:, 0] >= conf_thres).squeeze()
        #----------------------------------------------------------#
        #   根據(jù)置信度進(jìn)行預(yù)測(cè)結(jié)果的篩選
        #----------------------------------------------------------#
        image_pred = image_pred[conf_mask]
        class_conf = class_conf[conf_mask]
        class_pred = class_pred[conf_mask]
        if not image_pred.size(0):
            continue
        #-------------------------------------------------------------------------#
        #   detections  [num_anchors, 7]
        #   7的內(nèi)容為：x1, y1, x2, y2, obj_conf, class_conf, class_pred
        #-------------------------------------------------------------------------#
        detections = torch.cat((image_pred[:, :5], class_conf.float(), class_pred.float()), 1)
        #------------------------------------------#
        #   獲得預(yù)測(cè)結(jié)果中包含的所有種類
        #------------------------------------------#
        unique_labels = detections[:, -1].cpu().unique()
        if prediction.is_cuda:
            unique_labels = unique_labels.cuda()
            detections = detections.cuda()
        for c in unique_labels:
            #------------------------------------------#
            #   獲得某一類得分篩選后全部的預(yù)測(cè)結(jié)果
            #------------------------------------------#
            detections_class = detections[detections[:, -1] == c]
            # #------------------------------------------#
            # #   使用官方自帶的非極大抑制會(huì)速度更快一些！
            # #------------------------------------------#
            # keep = nms(
            #     detections_class[:, :4],
            #     detections_class[:, 4] * detections_class[:, 5],
            #     nms_thres
            # )
            # max_detections = detections_class[keep]
            # 按照存在物體的置信度排序
            _, conf_sort_index = torch.sort(detections_class[:, 4]*detections_class[:, 5], descending=True)
            detections_class = detections_class[conf_sort_index]
            # 進(jìn)行非極大抑制
            max_detections = []
            while detections_class.size(0):
                # 取出這一類置信度最高的，一步一步往下判斷，判斷重合程度是否大于nms_thres，如果是則去除掉
                max_detections.append(detections_class[0].unsqueeze(0))
                if len(detections_class) == 1:
                    break
                ious                    = self.bbox_iou(max_detections[-1], detections_class[1:])
                detections_class[1:, 4] = torch.exp(-(ious * ious) / sigma) * detections_class[1:, 4]
                detections_class        = detections_class[1:]
                detections_class        = detections_class[detections_class[:, 4] >= conf_thres]
                arg_sort                = torch.argsort(detections_class[:, 4], descending = True)
                detections_class        = detections_class[arg_sort]
            # 堆疊
            max_detections = torch.cat(max_detections).data
            # Add max detections to outputs
            output[i] = max_detections if output[i] is None else torch.cat((output[i], max_detections))
        if output[i] is not None:
            output[i]           = output[i].cpu().numpy()
            box_xy, box_wh      = (output[i][:, 0:2] + output[i][:, 2:4])/2, output[i][:, 2:4] - output[i][:, 0:2]
            output[i][:, :4]    = self.yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image)
    return output

以上就是python目標(biāo)檢測(cè)非極大抑制NMS與Soft-NMS的詳細(xì)內(nèi)容，更多關(guān)于非極大抑制NMS Soft-NMS的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章: