【抬抬小手學Python】yolov3代碼和模型結構圖詳細注解【圖文】

查理不是猹 2022-01-07 21:35:45 阅读数:424

抬抬 小手 python yolov3 yolov

我對他的框圖加了注釋,便於理解,紅色圈為yolo\_block,深紅色注解為前一模塊的輸出,請對照代碼

YOLOv3相比於之前的yolo1和yolo2,改進較大,主要改進方向有:

**1、使用了殘差網絡Residual,殘差卷積就是進行一次3X3的卷積,然後保存該卷積layer,再進行一次1X1的卷積和一次3X3的卷積,並把這個結果加上layer作為最後的結果, 殘差網絡的特點是容易優化,並且能够通過增加相當的深度來提高准確率。其內部的殘差塊使用了跳躍連接,緩解了在深度神經網絡中增加深度帶來的梯度消失問題。
2、提取多特征層進行目標檢測,一共提取三個特征層(粉色方框圖),它的shape分別為(13,13,75),(26,26,75),(52,52,75)最後一個維度為75是因為該圖是基於voc數據集的,它的類為20種,yolo3只有針對每一個特征層存在3個先驗框,所以最後維度為3x25。
3、其采用反卷積UmSampling2d設計,逆卷積相對於卷積在神經網絡結構的正向和反向傳播中做相反的運算,其可以更多更好的提取出特征**

\# l2 正則化
def \_batch\_normalization\_layer(self, input\_layer, name = None, training = True, norm\_decay = 0.99, norm\_epsilon = 1e-3):
'''
Introduction
------------
對卷積層提取的feature map使用batch normalization
Parameters
----------
input\_layer: 輸入的四維tensor
name: batchnorm層的名字
trainging: 是否為訓練過程
norm\_decay: 在預測時計算moving average時的衰减率
norm\_epsilon: 方差加上極小的數,防止除以0的情况
Returns
-------
bn\_layer: batch normalization處理之後的feature map
'''
bn\_layer = tf.layers.batch\_normalization(inputs = input\_layer,
momentum = norm\_decay, epsilon = norm\_epsilon, center = True,
scale = True, training = training, name = name)
return tf.nn.leaky\_relu(bn\_layer, alpha = 0.1)
\# 這個就是用來進行卷積的
def \_conv2d\_layer(self, inputs, filters\_num, kernel\_size, name, use\_bias = False, strides = 1):
"""
Introduction
------------
使用tf.layers.conv2d减少權重和偏置矩陣初始化過程,以及卷積後加上偏置項的操作
經過卷積之後需要進行batch norm,最後使用leaky ReLU激活函數
根據卷積時的步長,如果卷積的步長為2,則對圖像進行降采樣
比如,輸入圖片的大小為416\*416,卷積核大小為3,若stride為2時,(416 - 3 + 2)/ 2 + 1, 計算結果為208,相當於做了池化層處理
因此需要對stride大於1的時候,先進行一個padding操作, 采用四周都padding一維代替'same'方式
Parameters
----------
inputs: 輸入變量
filters\_num: 卷積核數量
strides: 卷積步長
name: 卷積層名字
trainging: 是否為訓練過程
use\_bias: 是否使用偏置項
kernel\_size: 卷積核大小
Returns
-------
conv: 卷積之後的feature map
"""
conv = tf.layers.conv2d(
inputs = inputs, filters = filters\_num,
kernel\_size = kernel\_size, strides = \[strides, strides\], kernel\_initializer = tf.glorot\_uniform\_initializer(),
padding = ('SAME' if strides == 1 else 'VALID'), kernel\_regularizer = tf.contrib.layers.l2\_regularizer(scale = 5e-4), use\_bias = use\_bias, name = name)
return conv
\# 這個用來進行殘差卷積的
\# 殘差卷積就是進行一次3X3的卷積,然後保存該卷積layer
\# 再進行一次1X1的卷積和一次3X3的卷積,並把這個結果加上layer作為最後的結果
def \_Residual\_block(self, inputs, filters\_num, blocks\_num, conv\_index, training = True, norm\_decay = 0.99, norm\_epsilon = 1e-3):
"""
Introduction
------------
Darknet的殘差block,類似resnet的兩層卷積結構,分別采用1x1和3x3的卷積核,使用1x1是為了减少channel的維度
Parameters
----------
inputs: 輸入變量
filters\_num: 卷積核數量
trainging: 是否為訓練過程
blocks\_num: block的數量
conv\_index: 為了方便加載預訓練權重,統一命名序號
weights\_dict: 加載預訓練模型的權重
norm\_decay: 在預測時計算moving average時的衰减率
norm\_epsilon: 方差加上極小的數,防止除以0的情况
Returns
-------
inputs: 經過殘差網絡處理後的結果
"""
# 在輸入feature map的長寬維度進行padding
inputs = tf.pad(inputs, paddings=\[\[0, 0\], \[1, 0\], \[1, 0\], \[0, 0\]\], mode='CONSTANT')
layer = self.\_conv2d\_layer(inputs, filters\_num, kernel\_size = 3, strides = 2, name = "conv2d\_" + str(conv\_index))
layer = self.\_batch\_normalization\_layer(layer, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
for \_ in range(blocks\_num):
shortcut = layer
layer = self.\_conv2d\_layer(layer, filters\_num // 2, kernel\_size = 1, strides = 1, name = "conv2d\_" + str(conv\_index))
layer = self.\_batch\_normalization\_layer(layer, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
layer = self.\_conv2d\_layer(layer, filters\_num, kernel\_size = 3, strides = 1, name = "conv2d\_" + str(conv\_index))
layer = self.\_batch\_normalization\_layer(layer, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
layer += shortcut
return layer, conv\_index
#---------------------------------------#
\# 生成\_darknet53和逆卷積層
#---------------------------------------#
def \_darknet53(self, inputs, conv\_index, training = True, norm\_decay = 0.99, norm\_epsilon = 1e-3):
"""
Introduction
------------
構建yolo3使用的darknet53網絡結構
Parameters
----------
inputs: 模型輸入變量
conv\_index: 卷積層數序號,方便根據名字加載預訓練權重
weights\_dict: 預訓練權重
training: 是否為訓練
norm\_decay: 在預測時計算moving average時的衰减率
norm\_epsilon: 方差加上極小的數,防止除以0的情况
Returns
-------
conv: 經過52層卷積計算之後的結果, 輸入圖片為416x416x3,則此時輸出的結果shape為13x13x1024
route1: 返回第26層卷積計算結果52x52x256, 供後續使用
route2: 返回第43層卷積計算結果26x26x512, 供後續使用
conv\_index: 卷積層計數,方便在加載預訓練模型時使用
"""
with tf.variable\_scope('darknet53'):
# 416,416,3 -> 416,416,32
conv = self.\_conv2d\_layer(inputs, filters\_num = 32, kernel\_size = 3, strides = 1, name = "conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
# 416,416,32 -> 208,208,64
conv, conv\_index = self.\_Residual\_block(conv, conv\_index = conv\_index, filters\_num = 64, blocks\_num = 1, training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
# 208,208,64 -> 104,104,128
conv, conv\_index = self.\_Residual\_block(conv, conv\_index = conv\_index, filters\_num = 128, blocks\_num = 2, training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
# 104,104,128 -> 52,52,256
conv, conv\_index = self.\_Residual\_block(conv, conv\_index = conv\_index, filters\_num = 256, blocks\_num = 8, training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
# route1 = 52,52,256
route1 = conv
# 52,52,256 -> 26,26,512
conv, conv\_index = self.\_Residual\_block(conv, conv\_index = conv\_index, filters\_num = 512, blocks\_num = 8, training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
# route2 = 26,26,512
route2 = conv
# 26,26,512 -> 13,13,1024
conv, conv\_index = self.\_Residual\_block(conv, conv\_index = conv\_index, filters\_num = 1024, blocks\_num = 4, training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
# route3 = 13,13,1024
return route1, route2, conv, conv\_index
\# 輸出兩個網絡結果
\# 第一個是進行5次卷積後,用於下一次逆卷積的,卷積過程是1X1,3X3,1X1,3X3,1X1
\# 第二個是進行5+2次卷積,作為一個特征層的,卷積過程是1X1,3X3,1X1,3X3,1X1,3X3,1X1
def \_yolo\_block(self, inputs, filters\_num, out\_filters, conv\_index, training = True, norm\_decay = 0.99, norm\_epsilon = 1e-3):
"""
Introduction
------------
yolo3在Darknet53提取的特征層基礎上,又加了針對3種不同比例的feature map的block,這樣來提高對小物體的檢測率
Parameters
----------
inputs: 輸入特征
filters\_num: 卷積核數量
out\_filters: 最後輸出層的卷積核數量
conv\_index: 卷積層數序號,方便根據名字加載預訓練權重
training: 是否為訓練
norm\_decay: 在預測時計算moving average時的衰减率
norm\_epsilon: 方差加上極小的數,防止除以0的情况
Returns
-------
route: 返回最後一層卷積的前一層結果
conv: 返回最後一層卷積的結果
conv\_index: conv層計數
"""
conv = self.\_conv2d\_layer(inputs, filters\_num = filters\_num, kernel\_size = 1, strides = 1, name = "conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
conv = self.\_conv2d\_layer(conv, filters\_num = filters\_num \* 2, kernel\_size = 3, strides = 1, name = "conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
conv = self.\_conv2d\_layer(conv, filters\_num = filters\_num, kernel\_size = 1, strides = 1, name = "conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
conv = self.\_conv2d\_layer(conv, filters\_num = filters\_num \* 2, kernel\_size = 3, strides = 1, name = "conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
conv = self.\_conv2d\_layer(conv, filters\_num = filters\_num, kernel\_size = 1, strides = 1, name = "conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
route = conv
conv = self.\_conv2d\_layer(conv, filters\_num = filters\_num \* 2, kernel\_size = 3, strides = 1, name = "conv2d\_" + str(conv\_index))
conv = self.\_batch\_normalization\_layer(conv, name = "batch\_normalization\_" + str(conv\_index), training = training, norm\_decay = norm\_decay, norm\_epsilon = norm\_epsilon)
conv\_index += 1
conv = self.\_conv2d\_layer(conv, filters\_num = out\_filters, kernel\_size = 1, strides = 1, name = "conv2d\_" + str(conv\_index), use\_bias = True)
conv\_index += 1
return route, conv, conv\_index
\# 返回三個特征層的內容
def yolo\_inference(self, inputs, num\_anchors, num\_classes, training = True):
"""
Introduction
------------
構建yolo模型結構
Parameters
----------
inputs: 模型的輸入變量
num\_anchors: 每個grid cell負責檢測的anchor數量
num\_classes: 類別數量
training: 是否為訓練模式
"""
conv\_index = 1
# route1 = 52,52,256、route2 = 26,26,512、route3 = 13,13,1024
conv2d\_26, conv2d\_43, conv, conv\_index = self.\_darknet53(inputs, conv\_index, training = training, norm\_decay = self.norm\_decay, norm\_epsilon = self.norm\_epsilon)
with tf.variable\_scope('yolo'):
#--------------------------------------#
# 獲得第一個特征層:conv2d\_59
#--------------------------------------#
# conv2d\_57 = 13,13,512,conv2d\_59 = 13,13,255(3x(80+5))
conv2d\_57, conv2d\_59, conv\_index = self.\_yolo\_block(conv, 512, num\_anchors \* (num\_classes + 5), conv\_index = conv\_index, training = training, norm\_decay = self.norm\_decay, norm\_epsilon = self.norm\_epsilon)
#--------------------------------------#
# 獲得第二個特征層:conv2d\_67
#--------------------------------------#
conv2d\_60 = self.\_conv2d\_layer(conv2d\_57, filters\_num = 256, kernel\_size = 1, strides = 1, name = "conv2d\_" + str(conv\_index))
conv2d\_60 = self.\_batch\_normalization\_layer(conv2d\_60, name = "batch\_normalization\_" + str(conv\_index),training = training, norm\_decay = self.norm\_decay, norm\_epsilon = self.norm\_epsilon)
conv\_index += 1
# unSample\_0 = 26,26,256
unSample\_0 = tf.image.resize\_nearest\_neighbor(conv2d\_60, \[2 \* tf.shape(conv2d\_60)\[1\], 2 \* tf.shape(conv2d\_60)\[1\]\], name='upSample\_0')
# route0 = 26,26,768
route0 = tf.concat(\[unSample\_0, conv2d\_43\], axis = -1, name = 'route\_0')
# conv2d\_65 = 52,52,256,conv2d\_67 = 26,26,255
conv2d\_65, conv2d\_67, conv\_index = self.\_yolo\_block(route0, 256, num\_anchors \* (num\_classes + 5), conv\_index = conv\_index, training = training, norm\_decay = self.norm\_decay, norm\_epsilon = self.norm\_epsilon)
#--------------------------------------#
# 獲得第三個特征層:conv2d\_75
#--------------------------------------#
conv2d\_68 = self.\_conv2d\_layer(conv2d\_65, filters\_num = 128, kernel\_size = 1, strides = 1, name = "conv2d\_" + str(conv\_index))
conv2d\_68 = self.\_batch\_normalization\_layer(conv2d\_68, name = "batch\_normalization\_" + str(conv\_index), training=training, norm\_decay=self.norm\_decay, norm\_epsilon = self.norm\_epsilon)
conv\_index += 1
# unSample\_1 = 52,52,128
unSample\_1 = tf.image.resize\_nearest\_neighbor(conv2d\_68, \[2 \* tf.shape(conv2d\_68)\[1\], 2 \* tf.shape(conv2d\_68)\[1\]\], name='upSample\_1')
# route1= 52,52,384
route1 = tf.concat(\[unSample\_1, conv2d\_26\], axis = -1, name = 'route\_1')
# conv2d\_75 = 52,52,255
\_, conv2d\_75, \_ = self.\_yolo\_block(route1, 128, num\_anchors \* (num\_classes + 5), conv\_index = conv\_index, training = training, norm\_decay = self.norm\_decay, norm\_epsilon = self.norm\_epsilon)
return \[conv2d\_59, conv2d\_67, conv2d\_75\]
版权声明:本文为[查理不是猹]所创,转载请带上原文链接,感谢。 https://gsmany.com/2022/01/202201072135447753.html