基于YOLOV5-MobilenetV3和声呐图像的鱼类识别轻量化模型

罗毅智; 陆华忠; 周星星; 袁余; 齐海军; 李斌; 刘志昌

doi:10.16768/j.issn.1004-874X.2023.07.004

基于YOLOV5-MobilenetV3和声呐图像的鱼类识别轻量化模型

Lightweight Model for Fish Recognition Based on YOLOV5-MobilenetV3 and Sonar Images

摘要

摘要:
目的网箱生物识别和统计是海洋牧场的养殖管理的关键参考因素之一。针对混响噪声和复杂背景的干扰，构建不同光照条件下鱼类检测数据集，采用前视声呐成像技术，提出一种基于YOLOV5-MobilenetV3和声呐图像的鱼类识别轻量化模型（LAPR-Net），实现浑浊或黑暗场景下水体网箱的鱼类识别。
方法以罗非鱼为研究对象，基于YOLOV5模型的框架结构，主干网络模块采用轻量级MobileNetV3 bneck模块，利用线性瓶颈的逆残差结构和深度可分离卷积提取声呐图像中鱼类的特征，通过注意力机制SE-Net获取声呐图像多尺度语义特征并增强特征之间的相关性；颈部网络采用路径聚合网络结构，对目标特征进行多尺度融合，增强特征融合能力；预测部分采用基于非极大抑制方法进行最大局部搜索，去除冗余的检测框，筛选置信度最高的检测框，最终输出并显示鱼的检测结果，包含位置、类别以及检测目标的概率。
结果选择4种其他主流的检测模型进行对比试验，包含YOLOV3-ting（Darknet53）、YOLOV5（CSPdarknet53）、YOLOV5（Repvgg）、YOLOV5s（Transformer），提出模型参数量为3 545 453、计算量为6.3 G、mAP为0.957，模型平均每张图片推理速度为0.08868 s，同YOLOV5模型相比，改进后模型mAP提高9.7%。
结论本文提出的模型提高了训练和识别速度，降低了硬件设备要求，可为海洋牧场网箱养殖鱼类检测模型提供参考。

Abstract:
Objective Cage biometrics and statistics are one of the key reference factors for marine pasture farming management. Aiming at the interference of reverberation noise and complex background, this paper constructs fish detection data sets under different lighting conditions, and uses forward-looking sonar imaging technology to propose a fish recognition lightweight model based on YOLOV5-MobilenetV3 and sonar images (LAPR-Net) to realize fish recognition in water cages in turbid or dark scenes.
Method Taking tilapia as the research object, based on the frame structure of the YOLOV5 model, the backbone network module ado pts the lightweight Mob ileNetV3 bneck block, using the linear bottleneck inverse residual structure and depth separable convolution extract the features of fish in sonar images, applying the attention mechanism SE-Net to obtain multi-scale semantic features of sonar images and enhance the correlation between features; the neck network adopts the path aggregation network structure to perform multi-scale fusion of target features, to enhance the feature fusion ability; the prediction part adopts the maximum local search based on the non-maximum suppression method, removes the redundant detection frame, screens the detection frame with the highest confidence, and finally outputs and displays the detection result of the fish, including the position, category and detection probability of detecting an object.
Result Four other mainstream detection models were selected for comparative experiments, including YOLOV3-ting (Darknet53), YOLOV5 (CSPdarknet53), YOLOV5 (Repvgg), and YOLOV5s (Transformer). It proposes the model parameter quantityof 3 545 453, FLOPs of 6.3 G, and the mAP of 0.957, and the average inference speed of each picture of the model is 0.08868 s. Compared with the YOLOV5 model, the mAP of the improved model has increased by 9.7%.
Conclusion The proposed network improves the speed of training and recognition, reduces the requirements for hardware equipment, and provides a reference for the detection model of cage cultured fish in marine pastures.

HTML全文

参考文献(38)

施引文献

资源附件(0)