Related papers: Improve Underwater Object Detection through YOLOv12 Architecture and Physics-informed Augmentation

Improve Underwater Object Detection through YOLOv12 Architecture and Physics-informed Augmentation

URL: http://arxiv.org/abs/2506.23505v1
Date: Mon, 30 Jun 2025 04:06:50 GMT
Title: Improve Underwater Object Detection through YOLOv12 Architecture and Physics-informed Augmentation
Authors: Tinh Nguyen,
Abstract summary: Underwater object detection is crucial for autonomous navigation, environmental monitoring, and marine exploration.<n>Current methods balance accuracy and computational efficiency, but they have trouble deploying in real-time under low visibility conditions.<n>This study advances underwater detection through the integration of physics-informed augmentation techniques with the YOLOv12 architecture.
Score: 0.20767168898581637
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Underwater object detection is crucial for autonomous navigation, environmental monitoring, and marine exploration, but it is severely hampered by light attenuation, turbidity, and occlusion. Current methods balance accuracy and computational efficiency, but they have trouble deploying in real-time under low visibility conditions. Through the integration of physics-informed augmentation techniques with the YOLOv12 architecture, this study advances underwater detection. With Residual ELAN blocks to preserve structural features in turbid waters and Area Attention to maintain large receptive fields for occluded objects while reducing computational complexity. Underwater optical properties are addressed by domain-specific augmentations such as turbulence adaptive blurring, biologically grounded occlusion simulation, and spectral HSV transformations for color distortion. Extensive tests on four difficult datasets show state-of-the-art performance, with Brackish data registering 98.30% mAP at 142 FPS. YOLOv12 improves occlusion robustness by 18.9%, small-object recall by 22.4%, and detection precision by up to 7.94% compared to previous models. The crucial role of augmentation strategy is validated by ablation studies. This work offers a precise and effective solution for conservation and underwater robotics applications.

Related papers

Learning Underwater Active Perception in Simulation [51.205673783866146]
Turbidity can jeopardise the whole mission as it may prevent correct visual documentation of the inspected structures.<n>Previous works have introduced methods to adapt to turbidity and backscattering.<n>We propose a simple yet efficient approach to enable high-quality image acquisition of assets in a broad range of water conditions.
arXiv Detail & Related papers (2025-04-23T06:48:38Z)
You Sense Only Once Beneath: Ultra-Light Real-Time Underwater Object Detection [2.5249064981269296]
We propose an Ultra-Light Real-Time Underwater Object Detection framework, You Sense Only Once Beneath (YSOOB)<n>Specifically, we utilize a Multi-Spectrum Wavelet (MSWE) to perform frequency-domain encoding on the input image, minimizing the semantic loss caused by underwater optical color distortion.<n>We also eliminate model redundancy through a simple yet effective channel compression and reconstructed large kernel convolution (RLKC) to achieve model lightweight.
arXiv Detail & Related papers (2025-04-22T08:26:35Z)
Sonar-based Deep Learning in Underwater Robotics: Overview, Robustness and Challenges [0.46873264197900916]
The predominant use of sonar in underwater environments, characterized by limited training data and inherent noise, poses challenges to model robustness.<n>This paper studies sonar-based perception task models, such as classification, object detection, segmentation, and SLAM.<n>It systematizes sonar-based state-of-the-art datasets, simulators, and robustness methods such as neural network verification, out-of-distribution, and adversarial attacks.
arXiv Detail & Related papers (2024-12-16T15:03:08Z)
FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation [65.01601309903971]
We introduce FAFA, a Frequency-Aware Flow-Aided self-supervised framework for 6D pose estimation of unmanned underwater vehicles (UUVs) Our framework relies solely on the 3D model and RGB images, alleviating the need for any real pose annotations or other-modality data like depths. We evaluate the effectiveness of FAFA on common underwater object pose benchmarks and showcase significant performance improvements compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-09-25T03:54:01Z)
Improving Underwater Visual Tracking With a Large Scale Dataset and Image Enhancement [70.2429155741593]
This paper presents a new dataset and general tracker enhancement method for Underwater Visual Object Tracking (UVOT) It poses distinct challenges; the underwater environment exhibits non-uniform lighting conditions, low visibility, lack of sharpness, low contrast, camouflage, and reflections from suspended particles. We propose a novel underwater image enhancement algorithm designed specifically to boost tracking quality. The method has resulted in a significant performance improvement, of up to 5.0% AUC, of state-of-the-art (SOTA) visual trackers.
arXiv Detail & Related papers (2023-08-30T07:41:26Z)
Learning Heavily-Degraded Prior for Underwater Object Detection [59.5084433933765]
This paper seeks transferable prior knowledge from detector-friendly images. It is based on statistical observations that, the heavily degraded regions of detector-friendly (DFUI) and underwater images have evident feature distribution gaps. Our method with higher speeds and less parameters still performs better than transformer-based detectors.
arXiv Detail & Related papers (2023-08-24T12:32:46Z)
Efficient Real-time Smoke Filtration with 3D LiDAR for Search and Rescue with Autonomous Heterogeneous Robotic Systems [56.838297900091426]
Smoke and dust affect the performance of any mobile robotic platform due to their reliance on onboard perception systems. This paper proposes a novel modular computation filtration pipeline based on intensity and spatial information.
arXiv Detail & Related papers (2023-08-14T16:48:57Z)
DeepSeaNet: Improving Underwater Object Detection using EfficientDet [0.0]
This project involves implementing and evaluating various object detection models on an annotated underwater dataset. The dataset comprises annotated image sequences of fish, crabs, starfish, and other aquatic animals captured in Limfjorden water with limited visibility. I compare the results of YOLOv3 (31.10% mean Average Precision (mAP)), YOLOv4 (83.72% mAP), YOLOv5 (97.6%), YOLOv8 (98.20%), EfficientDet (98.56% mAP) and Detectron2 (95.20% mAP) on the same dataset.
arXiv Detail & Related papers (2023-05-26T13:41:35Z)
DeepAqua: Self-Supervised Semantic Segmentation of Wetland Surface Water Extent with SAR Images using Knowledge Distillation [44.99833362998488]
We present DeepAqua, a self-supervised deep learning model that eliminates the need for manual annotations during the training phase. We exploit cases where optical- and radar-based water masks coincide, enabling the detection of both open and vegetated water surfaces. Experimental results show that DeepAqua outperforms other unsupervised methods by improving accuracy by 7%, Intersection Over Union by 27%, and F1 score by 14%.
arXiv Detail & Related papers (2023-05-02T18:06:21Z)
GAMMA: Generative Augmentation for Attentive Marine Debris Detection [0.0]
We propose an efficient and generative augmentation approach to solve the inadequacy concern of underwater debris data for visual detection. We use cycleGAN as a data augmentation technique to convert openly available, abundant data of terrestrial plastic to underwater-style images. We also propose a novel architecture for underwater debris detection using an attention mechanism.
arXiv Detail & Related papers (2022-12-07T16:30:51Z)
Learning-based estimation of in-situ wind speed from underwater acoustics [58.293528982012255]
We introduce a deep learning approach for the retrieval of wind speed time series from underwater acoustics. Our approach bridges data assimilation and learning-based frameworks to benefit both from prior physical knowledge and computational efficiency.
arXiv Detail & Related papers (2022-08-18T15:27:40Z)
A Novel Underwater Image Enhancement and Improved Underwater Biological Detection Pipeline [8.326477369707122]
This paper proposes a novel method for capturing feature information, which adds the convolutional block attention module (CBAM) to the YOLOv5 backbone. The interference of underwater creature characteristics on object characteristics is decreased, and the output of the backbone network to object information is enhanced.
arXiv Detail & Related papers (2022-05-20T14:18:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.