A Structured Review of Underwater Object Detection Challenges and Solutions: From Traditional to Large Vision Language Models
- URL: http://arxiv.org/abs/2509.08490v1
- Date: Wed, 10 Sep 2025 11:01:29 GMT
- Title: A Structured Review of Underwater Object Detection Challenges and Solutions: From Traditional to Large Vision Language Models
- Authors: Edwine Nabahirwa, Wei Song, Minghua Zhang, Yi Fang, Zhou Ni,
- Abstract summary: Underwater object detection (UOD) is vital to diverse marine applications, including oceanographic research, underwater robotics, and marine conservation.<n>Current UOD methods are insufficient to fully address challenges like image degradation and small object detection in dynamic underwater environments.<n>Large vision-language models (LVLMs) hold significant promise for UOD, but their real-time application remains under-explored.
- Score: 10.013311332835823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Underwater object detection (UOD) is vital to diverse marine applications, including oceanographic research, underwater robotics, and marine conservation. However, UOD faces numerous challenges that compromise its performance. Over the years, various methods have been proposed to address these issues, but they often fail to fully capture the complexities of underwater environments. This review systematically categorizes UOD challenges into five key areas: Image quality degradation, target-related issues, data-related challenges, computational and processing constraints, and limitations in detection methodologies. To address these challenges, we analyze the progression from traditional image processing and object detection techniques to modern approaches. Additionally, we explore the potential of large vision-language models (LVLMs) in UOD, leveraging their multi-modal capabilities demonstrated in other domains. We also present case studies, including synthetic dataset generation using DALL-E 3 and fine-tuning Florence-2 LVLM for UOD. This review identifies three key insights: (i) Current UOD methods are insufficient to fully address challenges like image degradation and small object detection in dynamic underwater environments. (ii) Synthetic data generation using LVLMs shows potential for augmenting datasets but requires further refinement to ensure realism and applicability. (iii) LVLMs hold significant promise for UOD, but their real-time application remains under-explored, requiring further research on optimization techniques.
Related papers
- Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation [75.30238170051291]
Depth estimation is a fundamental task in 3D computer vision, crucial for applications such as 3D reconstruction, free-viewpoint rendering, robotics, autonomous driving, and AR/VR technologies.<n>Traditional methods relying on hardware sensors like LiDAR are often limited by high costs, low resolution, and environmental sensitivity, limiting their applicability in real-world scenarios.<n>Recent advances in vision-based methods offer a promising alternative, yet they face challenges in generalization and stability due to either the low-capacity model architectures or the reliance on domain-specific and small-scale datasets.
arXiv Detail & Related papers (2025-07-15T17:59:59Z) - Visual enhancement and 3D representation for underwater scenes: a review [13.809193345785388]
Underwater visual enhancement (UVE) and underwater 3D reconstruction pose significant challenges in computer vision and AI-based tasks.<n>To advance research in these areas, we present an in-depth review from multiple perspectives.
arXiv Detail & Related papers (2025-05-03T17:20:24Z) - Underwater Image Enhancement using Generative Adversarial Networks: A Survey [1.2582887633807602]
Generative Adversarial Networks (GANs) have emerged as a powerful tool for enhancing underwater photos.<n>GANs have been applied to real-world applications, including marine biology and ecosystem monitoring, coral reef health assessment, underwater archaeology, and autonomous underwater vehicle (AUV) navigation.<n>This paper explores all major approaches to underwater image enhancement, from physical and physics-free models to CNN-based models and state-of-the-art GAN-based methods.
arXiv Detail & Related papers (2025-01-10T06:41:19Z) - Sonar-based Deep Learning in Underwater Robotics: Overview, Robustness and Challenges [0.46873264197900916]
The predominant use of sonar in underwater environments, characterized by limited training data and inherent noise, poses challenges to model robustness.<n>This paper studies sonar-based perception task models, such as classification, object detection, segmentation, and SLAM.<n>It systematizes sonar-based state-of-the-art datasets, simulators, and robustness methods such as neural network verification, out-of-distribution, and adversarial attacks.
arXiv Detail & Related papers (2024-12-16T15:03:08Z) - UW-SDF: Exploiting Hybrid Geometric Priors for Neural SDF Reconstruction from Underwater Multi-view Monocular Images [63.32490897641344]
We propose a framework for reconstructing target objects from multi-view underwater images based on neural SDF.
We introduce hybrid geometric priors to optimize the reconstruction process, markedly enhancing the quality and efficiency of neural SDF reconstruction.
arXiv Detail & Related papers (2024-10-10T16:33:56Z) - Underwater Object Detection in the Era of Artificial Intelligence: Current, Challenge, and Future [119.88454942558485]
Underwater object detection (UOD) aims to identify and localise objects in underwater images or videos.
In recent years, artificial intelligence (AI) based methods, especially deep learning methods, have shown promising performance in UOD.
arXiv Detail & Related papers (2024-10-08T00:25:33Z) - FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation [65.01601309903971]
We introduce FAFA, a Frequency-Aware Flow-Aided self-supervised framework for 6D pose estimation of unmanned underwater vehicles (UUVs)
Our framework relies solely on the 3D model and RGB images, alleviating the need for any real pose annotations or other-modality data like depths.
We evaluate the effectiveness of FAFA on common underwater object pose benchmarks and showcase significant performance improvements compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-09-25T03:54:01Z) - Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning.
Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques.
Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z) - Remote Sensing Object Detection Meets Deep Learning: A Meta-review of
Challenges and Advances [51.70835702029498]
This review aims to present a comprehensive review of the recent achievements in deep learning based RSOD methods.
We identify five main challenges in RSOD, including multi-scale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision.
We also review the widely used benchmark datasets and evaluation metrics within the field of RSOD, as well as the application scenarios for RSOD.
arXiv Detail & Related papers (2023-09-13T06:48:32Z) - Edge-guided Representation Learning for Underwater Object Detection [15.832646455660278]
Underwater object detection is crucial for marine economic development, environmental protection, and the planet's sustainable development.
Main challenges of this task arise from low-contrast, small objects, and mimicry of aquatic organisms.
We propose an Edge-guided Representation Learning Network, termed ERL-Net, that aims to achieve discriminative representation learning and aggregation.
arXiv Detail & Related papers (2023-06-01T08:29:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.