26 網(wǎng)絡(luò)中的網(wǎng)絡(luò) NiN【動(dòng)手學(xué)深度學(xué)習(xí)v2】

Net in Net,一種現(xiàn)在比較冷門的網(wǎng)絡(luò)結(jié)構(gòu)
在常用的網(wǎng)絡(luò)中,全連接層的存在往往會(huì)使要訓(xùn)練的參數(shù)爆炸增長(zhǎng)。
NiN思想:完全拋棄全連接層

在NiN中,最重要的結(jié)構(gòu)是NiN塊(大部分經(jīng)典網(wǎng)絡(luò)結(jié)構(gòu)的主要組成部分都是各種網(wǎng)絡(luò)塊)
NiN塊主要由一個(gè)卷積層和兩個(gè)1*1全連接層組成(也可視為1*1卷積層),相當(dāng)于一個(gè)參數(shù)減少了的全連接層。

NiN架構(gòu):交替使用NiN塊和二步最大池化層



NiN塊以一個(gè)普通卷積層開始,后面是兩個(gè)1×1
的卷積層。這兩個(gè)1×1卷積層充當(dāng)帶有ReLU激活函數(shù)的逐像素全連接層。 第一層的卷積窗口形狀通常由用戶設(shè)置。 隨后的卷積窗口形狀固定為1×1。
代碼實(shí)現(xiàn):
構(gòu)建NiN塊:
import torch from torch import nn from d2l import torch as d2l def nin_block(in_channels, out_channels, kernel_size, strides, padding): return nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size, strides, padding), nn.ReLU(), nn.Conv2d(out_channels, out_channels, kernel_size=1), nn.ReLU(), nn.Conv2d(out_channels, out_channels, kernel_size=1), nn.ReLU())
由上述代碼可以看出,一個(gè)NiN塊中含有三個(gè)ReLU,NiN的非線性擬合主要是ReLU帶來的
構(gòu)建網(wǎng)絡(luò):
net = nn.Sequential( nin_block(1, 96, kernel_size=11, strides=4, padding=0), nn.MaxPool2d(3, stride=2), nin_block(96, 256, kernel_size=5, strides=1, padding=2), nn.MaxPool2d(3, stride=2), nin_block(256, 384, kernel_size=3, strides=1, padding=1), nn.MaxPool2d(3, stride=2), nn.Dropout(0.5), # 標(biāo)簽類別數(shù)是10 nin_block(384, 10, kernel_size=3, strides=1, padding=1), nn.AdaptiveAvgPool2d((1, 1)), #AdaptiveAvgPool2d:全局平均池化 # 將四維的輸出轉(zhuǎn)成二維的輸出,其形狀為(批量大小,10),這個(gè)數(shù)據(jù)可以直接輸入softmax進(jìn)行最大似然估計(jì),softmax已經(jīng)寫在train函數(shù)里了 nn.Flatten())
創(chuàng)建一個(gè)數(shù)據(jù)樣本來查看每個(gè)塊的輸出形狀。
X = torch.rand(size=(1, 1, 224, 224)) for layer in net: X = layer(X) print(layer.__class__.__name__,'output shape:\t', X.shape)
Sequential output shape: torch.Size([1, 96, 54, 54]) MaxPool2d output shape: torch.Size([1, 96, 26, 26]) Sequential output shape: torch.Size([1, 256, 26, 26]) MaxPool2d output shape: torch.Size([1, 256, 12, 12]) Sequential output shape: torch.Size([1, 384, 12, 12]) MaxPool2d output shape: torch.Size([1, 384, 5, 5]) Dropout output shape: torch.Size([1, 384, 5, 5]) Sequential output shape: torch.Size([1, 10, 5, 5]) AdaptiveAvgPool2d output shape: torch.Size([1, 10, 1, 1]) Flatten output shape: torch.Size([1, 10])
訓(xùn)練效果
lr, num_epochs, batch_size = 0.1, 10, 128 train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224) d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
loss 0.322, train acc 0.881, test acc 0.865 3226.1 examples/sec on cuda:0
標(biāo)簽: