手機(jī)站首頁(yè)散文詩(shī)歌雜文隨筆日記小小說(shuō)

散文網(wǎng) » 筆記 »全部筆記 » 41 物體檢測(cè)和數(shù)據(jù)集【動(dòng)手學(xué)深度學(xué)習(xí)v2】

41 物體檢測(cè)和數(shù)據(jù)集【動(dòng)手學(xué)深度學(xué)習(xí)v2】

2023-08-03 16:51 作者:月蕪SA 0人讀過(guò) | 我要投稿

在視覺(jué)領(lǐng)域，目標(biāo)檢測(cè)比圖片分類問(wèn)題要廣泛的多。

二者區(qū)別：

目標(biāo)檢測(cè)除了要對(duì)目標(biāo)進(jìn)行分類，還要標(biāo)出目標(biāo)所在位置。

邊緣框：一個(gè)邊緣框可以用四個(gè)數(shù)字來(lái)表示。標(biāo)注物體邊緣框的工作通常是人工進(jìn)行的。所以物體識(shí)別的數(shù)據(jù)集通常要比圖片分類要小很多，因?yàn)闃?biāo)注成本太高

目標(biāo)檢測(cè)數(shù)據(jù)集常見(jiàn)結(jié)構(gòu)：

文本文件，每行表示一個(gè)物體，數(shù)據(jù)通常由圖片文件名、物體類別和邊緣框組成。（當(dāng)一個(gè)圖片里有多個(gè)物體時(shí)，這些物體數(shù)據(jù)中的圖片文件名是相同的）

目標(biāo)檢測(cè)常用數(shù)據(jù)集：COCO

代碼實(shí)現(xiàn)

（圖片在img文件夾里，直接引用就可以了，slides是沒(méi)有img文件夾的，你可以去d2l里copy一下）

%matplotlib inline
import torch
from d2l import torch as d2l
d2l.set_figsize()
img = d2l.plt.imread('../img/catdog.jpg')
d2l.plt.imshow(img);

在這里，我們定義在這兩種表示法之間進(jìn)行轉(zhuǎn)換的函數(shù)：box_corner_to_center從兩角表示法轉(zhuǎn)換為中心寬度表示法，而box_center_to_corner反之亦然。輸入?yún)?shù)boxes可以是長(zhǎng)度為4的張量，也可以是形狀為（n，4）的二維張量，其中n是邊界框的數(shù)量。

#@save
def box_corner_to_center(boxes):
    """從（左上，右下）轉(zhuǎn)換到（中間，寬度，高度）"""
    x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
    cx = (x1 + x2) / 2
    cy = (y1 + y2) / 2
    w = x2 - x1
    h = y2 - y1
    boxes = torch.stack((cx, cy, w, h), axis=-1)
    return boxes

#@save
def box_center_to_corner(boxes):
    """從（中間，寬度，高度）轉(zhuǎn)換到（左上，右下）"""
    cx, cy, w, h = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
    x1 = cx - 0.5 * w
    y1 = cy - 0.5 * h
    x2 = cx + 0.5 * w
    y2 = cy + 0.5 * h
    boxes = torch.stack((x1, y1, x2, y2), axis=-1)
    return boxes

根據(jù)坐標(biāo)信息定義圖像中狗和貓的邊界框。圖像中坐標(biāo)的原點(diǎn)是圖像的左上角，向右的方向?yàn)閤軸的正方向，向下的方向?yàn)閥軸的正方向。

# bbox是邊界框的英文縮寫
dog_bbox, cat_bbox = [60.0, 45.0, 378.0, 516.0], [400.0, 112.0, 655.0, 493.0]

將邊界框在圖中畫出，以檢查其是否準(zhǔn)確。畫之前，我們定義一個(gè)輔助函數(shù)bbox_to_rect。它將邊界框表示成matplotlib的邊界框格式。

#@save
def bbox_to_rect(bbox, color):
    # 將邊界框(左上x,左上y,右下x,右下y)格式轉(zhuǎn)換成matplotlib格式：
    # ((左上x,左上y),寬,高)
    return d2l.plt.Rectangle(
        xy=(bbox[0], bbox[1]), width=bbox[2]-bbox[0], height=bbox[3]-bbox[1],
        fill=False, edgecolor=color, linewidth=2)

在圖像上添加邊界框之后，我們可以看到兩個(gè)物體的主要輪廓基本上在兩個(gè)框內(nèi)。

fig = d2l.plt.imshow(img)
fig.axes.add_patch(bbox_to_rect(dog_bbox, 'blue'))
fig.axes.add_patch(bbox_to_rect(cat_bbox, 'red'));

目標(biāo)檢測(cè)數(shù)據(jù)集沒(méi)有很小的數(shù)據(jù)集，所以訓(xùn)練起來(lái)比較慢。

下載數(shù)據(jù)集

包含所有圖像和CSV標(biāo)簽文件的香蕉檢測(cè)數(shù)據(jù)集可以直接從互聯(lián)網(wǎng)下載。

MXNET

PYTORCH

PADDLE

%matplotlib inline
import os
import pandas as pd
import torch
import torchvision
from d2l import torch as d2l

#@save
d2l.DATA_HUB['banana-detection'] = (
    d2l.DATA_URL + 'banana-detection.zip',
    '5de26c8fce5ccdea9f91267273464dc968d20d72')

讀取數(shù)據(jù)集

通過(guò)read_data_bananas函數(shù)，我們讀取香蕉檢測(cè)數(shù)據(jù)集。該數(shù)據(jù)集包括一個(gè)的CSV文件，內(nèi)含目標(biāo)類別標(biāo)簽和位于左上角和右下角的真實(shí)邊界框坐標(biāo)。

#@save
def read_data_bananas(is_train=True):
    """讀取香蕉檢測(cè)數(shù)據(jù)集中的圖像和標(biāo)簽"""
    data_dir = d2l.download_extract('banana-detection')
    csv_fname = os.path.join(data_dir, 'bananas_train' if is_train
                             else 'bananas_val', 'label.csv')
    csv_data = pd.read_csv(csv_fname)
    csv_data = csv_data.set_index('img_name')
    images, targets = [], []
    for img_name, target in csv_data.iterrows():
        images.append(torchvision.io.read_image(
            os.path.join(data_dir, 'bananas_train' if is_train else
                         'bananas_val', 'images', f'{img_name}')))
        # 這里的target包含（類別，左上角x，左上角y，右下角x，右下角y），
        # 其中所有圖像都具有相同的香蕉類（索引為0）
        targets.append(list(target))
    return images, torch.tensor(targets).unsqueeze(1) / 256

通過(guò)使用read_data_bananas函數(shù)讀取圖像和標(biāo)簽，以下BananasDataset類別將允許我們創(chuàng)建一個(gè)自定義Dataset實(shí)例來(lái)加載香蕉檢測(cè)數(shù)據(jù)集。

#@save
class BananasDataset(torch.utils.data.Dataset):
    """一個(gè)用于加載香蕉檢測(cè)數(shù)據(jù)集的自定義數(shù)據(jù)集"""
    def __init__(self, is_train):
        self.features, self.labels = read_data_bananas(is_train)
        print('read ' + str(len(self.features)) + (f' training examples' if
              is_train else f' validation examples'))

    def __getitem__(self, idx):
        return (self.features[idx].float(), self.labels[idx])

    def __len__(self):
        return len(self.features)

最后，我們定義load_data_bananas函數(shù)，來(lái)為訓(xùn)練集和測(cè)試集返回兩個(gè)數(shù)據(jù)加載器實(shí)例。對(duì)于測(cè)試集，無(wú)須按隨機(jī)順序讀取它。

#@save
def load_data_bananas(batch_size):
    """加載香蕉檢測(cè)數(shù)據(jù)集"""
    train_iter = torch.utils.data.DataLoader(BananasDataset(is_train=True),
                                             batch_size, shuffle=True)
    val_iter = torch.utils.data.DataLoader(BananasDataset(is_train=False),
                                           batch_size)
    return train_iter, val_iter

讓我們讀取一個(gè)小批量，并打印其中的圖像和標(biāo)簽的形狀。圖像的小批量的形狀為（批量大小、通道數(shù)、高度、寬度），看起來(lái)很眼熟：它與我們之前圖像分類任務(wù)中的相同。標(biāo)簽的小批量的形狀為（批量大小，m，5），其中m

是數(shù)據(jù)集的任何圖像中邊界框可能出現(xiàn)的最大數(shù)量。

小批量計(jì)算雖然高效，但它要求每張圖像含有相同數(shù)量的邊界框，以便放在同一個(gè)批量中。通常來(lái)說(shuō)，圖像可能擁有不同數(shù)量個(gè)邊界框；因此，在達(dá)到m之前，邊界框少于m的圖像將被非法邊界框填充。這樣，每個(gè)邊界框的標(biāo)簽將被長(zhǎng)度為5的數(shù)組表示。數(shù)組中的第一個(gè)元素是邊界框中對(duì)象的類別，其中-1表示用于填充的非法邊界框。數(shù)組的其余四個(gè)元素是邊界框左上角和右下角的（x，y）坐標(biāo)值（值域在0～1之間）。對(duì)于香蕉數(shù)據(jù)集而言，由于每張圖像上只有一個(gè)邊界框，因此m=1。

batch_size, edge_size = 32, 256
train_iter, _ = load_data_bananas(batch_size)
batch = next(iter(train_iter))
batch[0].shape, batch[1].shape

Downloading ../data/banana-detection.zip from http://d2l-data.s3-accelerate.amazonaws.com/banana-detection.zip...
read 1000 training examples
read 100 validation examples
(torch.Size([32, 3, 256, 256]), torch.Size([32, 1, 5]))

展示10幅帶有真實(shí)邊界框的圖像。我們可以看到在所有這些圖像中香蕉的旋轉(zhuǎn)角度、大小和位置都有所不同。當(dāng)然，這只是一個(gè)簡(jiǎn)單的人工數(shù)據(jù)集，實(shí)踐中真實(shí)世界的數(shù)據(jù)集通常要復(fù)雜得多。

imgs = (batch[0][0:10].permute(0, 2, 3, 1)) / 255
axes = d2l.show_images(imgs, 2, 5, scale=2)
for ax, label in zip(axes, batch[1][0:10]):
    d2l.show_bboxes(ax, [label[0][1:5] * edge_size], colors=['w'])

知識(shí)補(bǔ)充：

標(biāo)簽：