keras 實(shí)現(xiàn)輕量級(jí)網(wǎng)絡(luò)ShuffleNet教程
ShuffleNet是由曠世發(fā)表的一個(gè)計(jì)算效率極高的CNN架構(gòu),它是專門為計(jì)算能力非常有限的移動(dòng)設(shè)備(例如,10-150 MFLOPs)而設(shè)計(jì)的。該結(jié)構(gòu)利用組卷積和信道混洗兩種新的運(yùn)算方法,在保證計(jì)算精度的同時(shí),大大降低了計(jì)算成本。ImageNet分類和MS COCO對(duì)象檢測(cè)實(shí)驗(yàn)表明,在40 MFLOPs的計(jì)算預(yù)算下,ShuffleNet的性能優(yōu)于其他結(jié)構(gòu),例如,在ImageNet分類任務(wù)上,ShuffleNet的top-1 error 7.8%比最近的MobileNet低。在基于arm的移動(dòng)設(shè)備上,ShuffleNet比AlexNet實(shí)際加速了13倍,同時(shí)保持了相當(dāng)?shù)臏?zhǔn)確性。
Paper:ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile
Github:https://github.com/zjn-ai/ShuffleNet-keras
網(wǎng)絡(luò)架構(gòu)
組卷積
組卷積其實(shí)早在AlexNet中就用過了,當(dāng)時(shí)因?yàn)镚PU的顯存不足因而利用組卷積分配到兩個(gè)GPU上訓(xùn)練。簡(jiǎn)單來講,組卷積就是將輸入特征圖按照通道方向均分成多個(gè)大小一致的特征圖,如下圖所示左面是輸入特征圖右面是均分后的特征圖,然后對(duì)得到的每一個(gè)特征圖進(jìn)行正常的卷積操作,最后將輸出特征圖按照通道方向拼接起來就可以了。

目前很多框架都支持組卷積,但是tensorflow真的不知道在想什么,到現(xiàn)在還是不支持組卷積,只能自己寫,因此效率肯定不及其他框架原生支持的方法。組卷積層的代碼編寫思路就與上面所說的原理完全一致,代碼如下。
def _group_conv(x, filters, kernel, stride, groups):
"""
Group convolution
# Arguments
x: Tensor, input tensor of with `channels_last` or 'channels_first' data format
filters: Integer, number of output channels
kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.
strides: An integer or tuple/list of 2 integers,
specifying the strides of the convolution along the width and height.
Can be a single integer to specify the same value for
all spatial dimensions.
groups: Integer, number of groups per channel
# Returns
Output tensor
"""
channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
in_channels = K.int_shape(x)[channel_axis]
# number of input channels per group
nb_ig = in_channels // groups
# number of output channels per group
nb_og = filters // groups
gc_list = []
# Determine whether the number of filters is divisible by the number of groups
assert filters % groups == 0
for i in range(groups):
if channel_axis == -1:
x_group = Lambda(lambda z: z[:, :, :, i * nb_ig: (i + 1) * nb_ig])(x)
else:
x_group = Lambda(lambda z: z[:, i * nb_ig: (i + 1) * nb_ig, :, :])(x)
gc_list.append(Conv2D(filters=nb_og, kernel_size=kernel, strides=stride,
padding='same', use_bias=False)(x_group))
return Concatenate(axis=channel_axis)(gc_list)
通道混洗
通道混洗是這篇paper的重點(diǎn),盡管組卷積大量減少了計(jì)算量和參數(shù),但是通道之間的信息交流也受到了限制因而模型精度肯定會(huì)受到影響,因此作者提出通道混洗,在不增加參數(shù)量和計(jì)算量的基礎(chǔ)上加強(qiáng)通道之間的信息交流,如下圖所示。

通道混洗層的代碼實(shí)現(xiàn)很巧妙參考了別人的實(shí)現(xiàn)方法。通過下面的代碼說明,d代表特征圖的通道序號(hào),x是經(jīng)過通道混洗后的通道順序。
>>> d = np.array([0,1,2,3,4,5,6,7,8]) >>> x = np.reshape(d, (3,3)) >>> x = np.transpose(x, [1,0]) # 轉(zhuǎn)置 >>> x = np.reshape(x, (9,)) # 平鋪 '[0 1 2 3 4 5 6 7 8] --> [0 3 6 1 4 7 2 5 8]'
利用keras后端實(shí)現(xiàn)代碼:
def _channel_shuffle(x, groups): """ Channel shuffle layer # Arguments x: Tensor, input tensor of with `channels_last` or 'channels_first' data format groups: Integer, number of groups per channel # Returns Shuffled tensor """ if K.image_data_format() == 'channels_last': height, width, in_channels = K.int_shape(x)[1:] channels_per_group = in_channels // groups pre_shape = [-1, height, width, groups, channels_per_group] dim = (0, 1, 2, 4, 3) later_shape = [-1, height, width, in_channels] else: in_channels, height, width = K.int_shape(x)[1:] channels_per_group = in_channels // groups pre_shape = [-1, groups, channels_per_group, height, width] dim = (0, 2, 1, 3, 4) later_shape = [-1, in_channels, height, width] x = Lambda(lambda z: K.reshape(z, pre_shape))(x) x = Lambda(lambda z: K.permute_dimensions(z, dim))(x) x = Lambda(lambda z: K.reshape(z, later_shape))(x) return x
ShuffleNet Unit
ShuffleNet的主要構(gòu)成單元。下圖中,a圖為深度可分離卷積的基本架構(gòu),b圖為1步長(zhǎng)時(shí)用的單元,c圖為2步長(zhǎng)時(shí)用的單元。

ShuffleNet架構(gòu)
注意,對(duì)于第二階段(Stage2),作者沒有在第一個(gè)1×1卷積上應(yīng)用組卷積,因?yàn)檩斎胪ǖ赖臄?shù)量相對(duì)較少。

環(huán)境
Python 3.6
Tensorlow 1.13.1
Keras 2.2.4
實(shí)現(xiàn)
支持channel first或channel last
# -*- coding: utf-8 -*-
"""
Created on Thu Apr 25 18:26:41 2019
@author: zjn
"""
import numpy as np
from keras.callbacks import LearningRateScheduler
from keras.models import Model
from keras.layers import Input, Conv2D, Dropout, Dense, GlobalAveragePooling2D, Concatenate, AveragePooling2D
from keras.layers import Activation, BatchNormalization, add, Reshape, ReLU, DepthwiseConv2D, MaxPooling2D, Lambda
from keras.utils.vis_utils import plot_model
from keras import backend as K
from keras.optimizers import SGD
def _group_conv(x, filters, kernel, stride, groups):
"""
Group convolution
# Arguments
x: Tensor, input tensor of with `channels_last` or 'channels_first' data format
filters: Integer, number of output channels
kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.
strides: An integer or tuple/list of 2 integers,
specifying the strides of the convolution along the width and height.
Can be a single integer to specify the same value for
all spatial dimensions.
groups: Integer, number of groups per channel
# Returns
Output tensor
"""
channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
in_channels = K.int_shape(x)[channel_axis]
# number of input channels per group
nb_ig = in_channels // groups
# number of output channels per group
nb_og = filters // groups
gc_list = []
# Determine whether the number of filters is divisible by the number of groups
assert filters % groups == 0
for i in range(groups):
if channel_axis == -1:
x_group = Lambda(lambda z: z[:, :, :, i * nb_ig: (i + 1) * nb_ig])(x)
else:
x_group = Lambda(lambda z: z[:, i * nb_ig: (i + 1) * nb_ig, :, :])(x)
gc_list.append(Conv2D(filters=nb_og, kernel_size=kernel, strides=stride,
padding='same', use_bias=False)(x_group))
return Concatenate(axis=channel_axis)(gc_list)
def _channel_shuffle(x, groups):
"""
Channel shuffle layer
# Arguments
x: Tensor, input tensor of with `channels_last` or 'channels_first' data format
groups: Integer, number of groups per channel
# Returns
Shuffled tensor
"""
if K.image_data_format() == 'channels_last':
height, width, in_channels = K.int_shape(x)[1:]
channels_per_group = in_channels // groups
pre_shape = [-1, height, width, groups, channels_per_group]
dim = (0, 1, 2, 4, 3)
later_shape = [-1, height, width, in_channels]
else:
in_channels, height, width = K.int_shape(x)[1:]
channels_per_group = in_channels // groups
pre_shape = [-1, groups, channels_per_group, height, width]
dim = (0, 2, 1, 3, 4)
later_shape = [-1, in_channels, height, width]
x = Lambda(lambda z: K.reshape(z, pre_shape))(x)
x = Lambda(lambda z: K.permute_dimensions(z, dim))(x)
x = Lambda(lambda z: K.reshape(z, later_shape))(x)
return x
def _shufflenet_unit(inputs, filters, kernel, stride, groups, stage, bottleneck_ratio=0.25):
"""
ShuffleNet unit
# Arguments
inputs: Tensor, input tensor of with `channels_last` or 'channels_first' data format
filters: Integer, number of output channels
kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.
strides: An integer or tuple/list of 2 integers,
specifying the strides of the convolution along the width and height.
Can be a single integer to specify the same value for
all spatial dimensions.
groups: Integer, number of groups per channel
stage: Integer, stage number of ShuffleNet
bottleneck_channels: Float, bottleneck ratio implies the ratio of bottleneck channels to output channels
# Returns
Output tensor
# Note
For Stage 2, we(authors of shufflenet) do not apply group convolution on the first pointwise layer
because the number of input channels is relatively small.
"""
channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
in_channels = K.int_shape(inputs)[channel_axis]
bottleneck_channels = int(filters * bottleneck_ratio)
if stage == 2:
x = Conv2D(filters=bottleneck_channels, kernel_size=kernel, strides=1,
padding='same', use_bias=False)(inputs)
else:
x = _group_conv(inputs, bottleneck_channels, (1, 1), 1, groups)
x = BatchNormalization(axis=channel_axis)(x)
x = ReLU()(x)
x = _channel_shuffle(x, groups)
x = DepthwiseConv2D(kernel_size=kernel, strides=stride, depth_multiplier=1,
padding='same', use_bias=False)(x)
x = BatchNormalization(axis=channel_axis)(x)
if stride == 2:
x = _group_conv(x, filters - in_channels, (1, 1), 1, groups)
x = BatchNormalization(axis=channel_axis)(x)
avg = AveragePooling2D(pool_size=(3, 3), strides=2, padding='same')(inputs)
x = Concatenate(axis=channel_axis)([x, avg])
else:
x = _group_conv(x, filters, (1, 1), 1, groups)
x = BatchNormalization(axis=channel_axis)(x)
x = add([x, inputs])
return x
def _stage(x, filters, kernel, groups, repeat, stage):
"""
Stage of ShuffleNet
# Arguments
x: Tensor, input tensor of with `channels_last` or 'channels_first' data format
filters: Integer, number of output channels
kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.
strides: An integer or tuple/list of 2 integers,
specifying the strides of the convolution along the width and height.
Can be a single integer to specify the same value for
all spatial dimensions.
groups: Integer, number of groups per channel
repeat: Integer, total number of repetitions for a shuffle unit in every stage
stage: Integer, stage number of ShuffleNet
# Returns
Output tensor
"""
x = _shufflenet_unit(x, filters, kernel, 2, groups, stage)
for i in range(1, repeat):
x = _shufflenet_unit(x, filters, kernel, 1, groups, stage)
return x
def ShuffleNet(input_shape, classes):
"""
ShuffleNet architectures
# Arguments
input_shape: An integer or tuple/list of 3 integers, shape
of input tensor
k: Integer, number of classes to predict
# Returns
A keras model
"""
inputs = Input(shape=input_shape)
x = Conv2D(24, (3, 3), strides=2, padding='same', use_bias=True, activation='relu')(inputs)
x = MaxPooling2D(pool_size=(3, 3), strides=2, padding='same')(x)
x = _stage(x, filters=384, kernel=(3, 3), groups=8, repeat=4, stage=2)
x = _stage(x, filters=768, kernel=(3, 3), groups=8, repeat=8, stage=3)
x = _stage(x, filters=1536, kernel=(3, 3), groups=8, repeat=4, stage=4)
x = GlobalAveragePooling2D()(x)
x = Dense(classes)(x)
predicts = Activation('softmax')(x)
model = Model(inputs, predicts)
return model
if __name__ == '__main__':
model = ShuffleNet((224, 224, 3), 1000)
#plot_model(model, to_file='ShuffleNet.png', show_shapes=True)
以上這篇keras 實(shí)現(xiàn)輕量級(jí)網(wǎng)絡(luò)ShuffleNet教程就是小編分享給大家的全部?jī)?nèi)容了,希望能給大家一個(gè)參考,也希望大家多多支持腳本之家。
相關(guān)文章
python 如何將帶小數(shù)的浮點(diǎn)型字符串轉(zhuǎn)換為整數(shù)
在python中如何實(shí)現(xiàn)將帶小數(shù)的浮點(diǎn)型字符串轉(zhuǎn)換為整數(shù)呢?今天小編就為大家介紹一下解決方案,希望對(duì)大家有所幫助。一起跟隨小編過來看看吧2021-05-05
python pycharm最新版本激活碼(永久有效)附python安裝教程
PyCharm是一個(gè)多功能的集成開發(fā)環(huán)境,只需要在pycharm中創(chuàng)建python file就運(yùn)行python,并且pycharm內(nèi)置完備的功能,這篇文章給大家介紹python pycharm激活碼最新版,需要的朋友跟隨小編一起看看吧2020-01-01
關(guān)于Python錯(cuò)誤重試方法總結(jié)
在本篇文章里小編給網(wǎng)友們分享一篇關(guān)于關(guān)于Python錯(cuò)誤重試方法總結(jié)內(nèi)容,有需要的朋友們跟著學(xué)習(xí)參考下。2021-01-01
復(fù)化梯形求積分實(shí)例——用Python進(jìn)行數(shù)值計(jì)算
今天小編就為大家分享一篇復(fù)化梯形求積分實(shí)例——用Python進(jìn)行數(shù)值計(jì)算,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過來看看吧2019-11-11
pytorch動(dòng)態(tài)網(wǎng)絡(luò)以及權(quán)重共享實(shí)例
今天小編就為大家分享一篇pytorch動(dòng)態(tài)網(wǎng)絡(luò)以及權(quán)重共享實(shí)例,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過來看看吧2020-01-01
Python中socket網(wǎng)絡(luò)通信是干嘛的
在本篇文章里小編給大家分享的是關(guān)于Python中socket網(wǎng)絡(luò)通信知識(shí)點(diǎn)內(nèi)容,需要的朋友們可以跟著學(xué)習(xí)下。2020-05-05

