An Improved Underwater Recognition Algorithm for Subsea X-Tree Key Components Based on Deep Transfer Learning
Abstract
:1. Introduction
- (1)
- A target detection algorithm for the subsea X-tree component is proposed, which can be applied to VR operation assistant positioning and robot real-time positioning and mapping. This achievement, which has good performance through comparison, fills the gap in the field of subsea X-tree-related intelligent detection and recognition.
- (2)
- Based on the YOLOv4-tiny Identify Network framework, the backbone framework replaces the original CSPNet with ResNet-D to speed up detection and avoid loss of information due to 1 × 1 convolution and down-sampling.
- (3)
- A more efficient attention mechanism is applied to the two features extracted by the backbone feature extraction module, that is, to optimize the main feature extraction results before obtaining a larger field of perception to ensure more accurate feature information.
- (4)
- A two-stage migration learning training strategy is presented. ImageNet is used to pre-train and migrates to the model dataset captured on land; then, the training results of the previous dataset are migrated to the underwater recognition training task. This efficient training strategy effectively makes up for the problem of the small number of underwater datasets and the single scene, which effectively improves the recognition accuracy.
- (5)
- This paper establishes an underwater oil extraction tree part identification dataset under multiple backgrounds. It is worth mentioning that the subsea X-tree cannot obtain effective image and video information directly, so we build some models of subsea X-tree parts by using 3D printing technology and build part datasets under different backgrounds, including underwater environment backgrounds. Mosaic data enhancement is used to enhance the acquisition of data during the training process.
2. Related Work
3. Methodology
3.1. Efficient Channel Attention Module
3.2. Replace CSPBlock with ResBlock-D
3.3. Network Architecture
3.4. K-Means Clustering Anchor Box and Predicted Results Decoded
3.5. Stochastic Gradient Descent with Restart Algorithm
3.6. Transfer Learning and Data Enhance
3.6.1. Two-Stage Transfer Learning
- (1)
- The source domain identification task of the non-underwater dataset is carried out, and the key parts model of the underwater oil tree photographed in the common scene is used as the source domain training sample. The network parameters of a trained ImageNet CNN pre-training model are input as initialization parameters to train the target model.
- (2)
- A target domain model that is similar to the source domain model is built.
- (3)
- The key parts of the oil tree in the real underwater test scene are taken as the target domain identification task, and the parameters in the previous pre-training model are taken as the initialization parameters of the target domain model to train the target domain. The specific network dual migration model flow is shown in Figure 7 below.
3.6.2. Data Enhance
4. Results
4.1. Parts Selection, Labeling Strategy, and Test Introduction
4.2. Training configuration and parameters
4.3. Test Records and Comparison of Training Results
5. Discussion
5.1. Field Experiment/Generalization Validation
5.2. Future Research Directions
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Taylor, B.G.S. Offshore oil and gas. Ocean. Shorel. Manag. 1991, 16, 259–273. [Google Scholar] [CrossRef]
- O’Dea, A.; Flin, R.H. Site managers and safety leadership in the offshore oil and gas industry. Saf. Sci. 2001, 37, 39–57. [Google Scholar] [CrossRef]
- Fenton, S.P. Emerging Roles for Subsea Trees: Portals of Subsea System Functionality. In Proceedings of the Offshore Technology Conference, Houston, TX, USA, 4–7 May 2009. [Google Scholar]
- Langis, K.D.; Sattar, J. Real-Time Multi-Diver Tracking and Re-identification for Underwater Human-Robot Collaboration. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 1 June 2020. [Google Scholar]
- Nyrkov, A.P.; Sokolov, S.S.; Alimov, O.M.; Chernyi, S.G.; Dorovskoi, V.A. Optimal Identification for Objects in Problems on Recognition by Unmanned Underwater Vehicles. Autom. Control Comput. Sci. 2020, 54, 958–963. [Google Scholar] [CrossRef]
- Teng, B.; Zhao, H. Underwater target recognition methods based on the framework of deep learning: A survey. Int. J. Adv. Robot. Syst. 2020, 17, 1729881420976307. [Google Scholar] [CrossRef]
- Huang, H.; Tang, Q.; Li, J.; Zhang, W.; Bao, X.; Zhu, H.; Wang, G. A review on underwater autonomous environmental perception and target grasp, the challenge of robotic organism capture. Ocean Eng. 2019, 195, 106644. [Google Scholar] [CrossRef]
- Guan, Z.; Hou, C.; Zhou, S.; Guo, Z. Research on Underwater Target Recognition Technology Based on Neural Network. Wirel. Commun. Mob. Comput. 2022, 2022, 1530–8669. [Google Scholar] [CrossRef]
- Fatan, M.; Da Liri, M.R.; Shahri, A.M. Underwater cable detection in the images using edge classification based on texture information. Measurement 2016, 91, 309–317. [Google Scholar] [CrossRef]
- Han, F.; Yao, J.; Zhu, H.; Wang, C. Marine Organism Detection and Classification from Underwater Vision Based on the Deep CNN Method. Math. Probl. Eng. 2020, 2020, 3937580. [Google Scholar] [CrossRef]
- Han, M.; Lyu, Z.; Qiu, T.; Xu, M. A Review on Intelligence Dehazing and Color Restoration for Underwater Images. IEEE Trans. Syst. Man Cybern. Syst. 2018, 50, 1820–1832. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, X.; Shen, Z. YOLO-Submarine Cable: An Improved YOLO-V3 Network for Object Detection on Submarine Cable Images. J. Mar. Sci. Eng. 2022, 10, 1143. [Google Scholar] [CrossRef]
- Liu, Z.; Zhuang, Y.; Jia, P.; Wu, C. A Novel Underwater Image Enhancement and Improved Underwater Biological Detection Pipeline. J. Mar. Sci. Eng. 2022, 10, 1204. [Google Scholar] [CrossRef]
- Hannun, A.Y.; Case, C.; Casper, J.; Catanzaro, B.; Diamos, G.; Elsen, E.; Prenger, R.; Satheesh, S.; Sengupta, S.; Coates, A.; et al. Deep Speech: Scaling up end-to-end speech recognition. arXiv 2014, arXiv:1412.5567. [Google Scholar]
- Zhu, Z.; Dai, W.; Hu, Y.; Li, J. Speech emotion recognition model based on Bi-GRU and Focal Loss—ScienceDirect. Pattern Recognit. Lett. 2020, 140, 358–365. [Google Scholar] [CrossRef]
- Li, L.; Lin, Y.; Zhang, Z.; Wang, D. Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition. Comput. Sci. 2015, 426–429. Available online: https://arxiv.org/abs/1506.08349 (accessed on 21 September 2022).
- Yu, X.; Dong, M.; Xing, Y.; Chen, Y.; Shu, H.; Xu, W.; Yang, Z.; Hong, Z.; Dong, M. Transformer text recognition with deep learning algorithm. Comput. Commun. 2021, 8, 153–160. [Google Scholar]
- Ouyang, W.; Wang, X. Joint Deep Learning for Pedestrian Detection. In Proceedings of the IEEE International Conference on Computer Vision, Columbus, OH, USA, 24–27 June 2014. [Google Scholar]
- Xiao, Y.; Zhou, K.; Cui, G.; Jia, L.; Fang, Z.; Yang, X.; Xia, Q. Deep learning for occluded and multi-scale pedestrian detection: A review. IET Image Process 2021, 15, 286–301. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector; Springer: Cham, Switzerland, 2016. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. IEEE Comput. Soc. 2013. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. Comput. Sci. 2015. Available online: https://arxiv.org/abs/1504.08083 (accessed on 21 September 2022).
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. In Proceedings of the ICLR 2017, 5th International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Pan, S.J.; Qiang, Y. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A Survey on Deep Transfer Learning. In Proceedings of the International Conference on Artificial Neural Networks, Rhodes, Greece, 4–7 October 2018. [Google Scholar]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How Transferable are Features in Deep Neural Networks? MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
- Jia, D.; Wei, D.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
ROV | Parameters |
---|---|
Size | 450 mm × 340 mm × 280 mm |
Weight | 8 kg |
Thrusters | DC brushless motor × 8 |
Thruster thrust | Forward push 30 N, reverse 20 N |
Maximum speed | 2 kn |
Camera angle | 140° Front view; 70° Right view |
Camera/Minimum Illumination | 720p binoculars × 1 1080p monocular × 3/0.01 LUX |
Operating voltage/rated power | 24 V/2000 W |
Communication method | Zero buoyancy cable, 100 m |
Lighting method | High-brightness LED × 4 |
Software and Hardware Name | Specific Model |
---|---|
CPU | Intel(R) Core(TM) i9-89SOHK @2.90 GHz (12 CPUs) |
RAM | 32.0 GB RAM |
Graphics Card | NVIDIA GeForce GTX 1080 |
System | Windows 10 |
Frame | Pytorch-GPU |
CUDA Version | 9.0 |
Python version | 3.6.5 |
Software and hardware name | Specific model |
Training Parameters | Num | |
---|---|---|
inputs size | [512, 512] | |
num classes | 7 | |
Anchors mask | [[3, 4, 5], [1, 2, 3]] | |
Freeze | Unfreeze | |
Init epoch | 0 | 0 |
Interval epoch | 50 | 100 |
Freeze learning rate | 0.0001 | 0.00001 |
Batch size | 2 | 2 |
Algorithm Name | FPS |
---|---|
YOLOv4 | 22.03 |
YOLOv4-tiny | 40.53 |
SX-DCANet | 44.87 |
SSD | Fast R-CNN | YOLOv4-tiny | YOLOv4-tiny + CBMA | YOLOv4 | SX-DCANet | |
---|---|---|---|---|---|---|
mAP (%) | 91.681 | 93.966 | 94.953 | 95.291 | 96.177 | 95.893 |
Recall (%) | 87.569 | 89.877 | 90.697 | 90.733 | 92.449 | 92.296 |
Precision (%) | 91.279 | 95.296 | 96.371 | 96.587 | 97.194 | 96.597 |
F1-sorce (%) | 89.386 | 92.507 | 93.448 | 93.569 | 94.762 | 94.398 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, W.; Han, F.; Su, Z.; Qiu, X.; Zhang, J.; Zhao, Y. An Improved Underwater Recognition Algorithm for Subsea X-Tree Key Components Based on Deep Transfer Learning. J. Mar. Sci. Eng. 2022, 10, 1562. https://doi.org/10.3390/jmse10101562
Zhao W, Han F, Su Z, Qiu X, Zhang J, Zhao Y. An Improved Underwater Recognition Algorithm for Subsea X-Tree Key Components Based on Deep Transfer Learning. Journal of Marine Science and Engineering. 2022; 10(10):1562. https://doi.org/10.3390/jmse10101562
Chicago/Turabian StyleZhao, Wangyuan, Fenglei Han, Zhihao Su, Xinjie Qiu, Jiawei Zhang, and Yiming Zhao. 2022. "An Improved Underwater Recognition Algorithm for Subsea X-Tree Key Components Based on Deep Transfer Learning" Journal of Marine Science and Engineering 10, no. 10: 1562. https://doi.org/10.3390/jmse10101562