Related papers: An AI-Powered Autonomous Underwater System for Sea Exploration and Scientific Research

An AI-Powered Autonomous Underwater System for Sea Exploration and Scientific Research

URL: http://arxiv.org/abs/2512.07652v1
Date: Mon, 08 Dec 2025 15:45:40 GMT
Title: An AI-Powered Autonomous Underwater System for Sea Exploration and Scientific Research
Authors: Hamad Almazrouei, Mariam Al Nasseri, Maha Alzaabi,
Abstract summary: Traditional sea exploration faces significant challenges due to extreme conditions, limited visibility, and high costs.<n>This paper presents an innovative AI-powered Autonomous Underwater Vehicle (AUV) system designed to overcome these limitations.<n>The system integrates YOLOv12 Nano for real-time object detection, a Convolutional Neural Network (CNN) (ResNet50) for feature extraction, and K-Means++ clustering for grouping marine objects.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traditional sea exploration faces significant challenges due to extreme conditions, limited visibility, and high costs, resulting in vast unexplored ocean regions. This paper presents an innovative AI-powered Autonomous Underwater Vehicle (AUV) system designed to overcome these limitations by automating underwater object detection, analysis, and reporting. The system integrates YOLOv12 Nano for real-time object detection, a Convolutional Neural Network (CNN) (ResNet50) for feature extraction, Principal Component Analysis (PCA) for dimensionality reduction, and K-Means++ clustering for grouping marine objects based on visual characteristics. Furthermore, a Large Language Model (LLM) (GPT-4o Mini) is employed to generate structured reports and summaries of underwater findings, enhancing data interpretation. The system was trained and evaluated on a combined dataset of over 55,000 images from the DeepFish and OzFish datasets, capturing diverse Australian marine environments. Experimental results demonstrate the system's capability to detect marine objects with a mAP@0.5 of 0.512, a precision of 0.535, and a recall of 0.438. The integration of PCA effectively reduced feature dimensionality while preserving 98% variance, facilitating K-Means clustering which successfully grouped detected objects based on visual similarities. The LLM integration proved effective in generating insightful summaries of detections and clusters, supported by location data. This integrated approach significantly reduces the risks associated with human diving, increases mission efficiency, and enhances the speed and depth of underwater data analysis, paving the way for more effective scientific research and discovery in challenging marine environments.

Related papers

SPMamba-YOLO: An Underwater Object Detection Network Based on Multi-Scale Feature Enhancement and Global Context Modeling [12.390389688362506]
We propose a novel underwater object detection network that integrates multi-scale feature enhancement with global context modeling.<n>Experiments on the URPC2022 dataset demonstrate that the network outperforms the YOLOv8n baseline by more than 4.9% in mAP@0.5.
arXiv Detail & Related papers (2026-02-26T06:45:11Z)
FinSight-Net:A Physics-Aware Decoupled Network with Frequency-Domain Compensation for Underwater Fish Detection in Smart Aquaculture [8.150520348578087]
FinSight-Net is an efficient and physics-aware fish detection framework for aquaculture environments.<n>We show that FinSight-Net reaches 92.8% mAP, outperforming YOLOv11s by 4.8% while reducing parameters by 29.0%.<n>In particular, on UW-BlurredFish, FinSight-Net reaches 92.8% mAP, outperforming YOLOv11s by 4.8% while reducing parameters by 29.0%.
arXiv Detail & Related papers (2026-02-23T02:12:47Z)
IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation [56.43007596544299]
IndustryNav is the first dynamic industrial navigation benchmark for active spatial reasoning.<n>A study of nine state-of-the-art Visual Large Language Models reveals that closed-source models maintain a consistent advantage.
arXiv Detail & Related papers (2025-11-21T16:48:49Z)
Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection [54.1960918379255]
Neptune-X is a data-centric generative-selection framework for maritime object detection.<n>X-to-Maritime is a multi-modality-conditioned generative model that synthesizes diverse and realistic maritime scenes.<n>Our approach sets a new benchmark in maritime scene synthesis, significantly improving detection accuracy.
arXiv Detail & Related papers (2025-09-25T04:59:02Z)
Real-Time Fish Detection in Indonesian Marine Ecosystems Using Lightweight YOLOv10-nano Architecture [0.0]
This study explores the implementation of YOLOv10-nano, a state-of-the-art deep learning model, for real-time marine fish detection in Indonesian waters.<n>YOLOv10's architecture, featuring improvements like the CSPNet backbone, PAN for feature fusion, and Pyramid Spatial Attention Block, enables efficient and accurate object detection.<n>Results show that YOLOv10-nano achieves a high detection accuracy with mAP50 of 0.966 and mAP50:95 of 0.606 while maintaining low computational demand.
arXiv Detail & Related papers (2025-09-22T07:02:48Z)
FishDet-M: A Unified Large-Scale Benchmark for Robust Fish Detection and CLIP-Guided Model Selection in Diverse Aquatic Visual Domains [1.3791394805787949]
FishDet-M is the largest unified benchmark for fish detection, comprising 13 publicly available datasets spanning diverse aquatic environments.<n>All data are harmonized using COCO-style annotations with both bounding boxes and segmentation masks.<n>FishDet-M establishes a standardized and reproducible platform for evaluating object detection in complex aquatic scenes.
arXiv Detail & Related papers (2025-07-23T18:32:01Z)
Improve Underwater Object Detection through YOLOv12 Architecture and Physics-informed Augmentation [0.20767168898581637]
Underwater object detection is crucial for autonomous navigation, environmental monitoring, and marine exploration.<n>Current methods balance accuracy and computational efficiency, but they have trouble deploying in real-time under low visibility conditions.<n>This study advances underwater detection through the integration of physics-informed augmentation techniques with the YOLOv12 architecture.
arXiv Detail & Related papers (2025-06-30T04:06:50Z)
NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z)
ReconMOST: Multi-Layer Sea Temperature Reconstruction with Observations-Guided Diffusion [48.540756751934836]
ReconMOST is a data-driven guided diffusion model framework for multi-layer sea temperature reconstruction.<n>Our method extends ML-based SST reconstruction to a global, multi-layer setting, handling over 92.5% missing data.
arXiv Detail & Related papers (2025-06-12T06:27:22Z)
MID: A Comprehensive Shore-Based Dataset for Multi-Scale Dense Ship Occlusion and Interaction Scenarios [10.748210940033484]
The Maritime Ship Navigation Behavior dataset (MID) is designed to address challenges in ship detection within complex maritime environments.<n>MID contains 5,673 images with 135,884 finely annotated target instances, supporting both supervised and semi-supervised learning.<n>MID's images are sourced from high-definition video clips of real-world navigation across 43 water areas, with varied weather and lighting conditions.
arXiv Detail & Related papers (2024-12-08T09:34:23Z)
Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head. The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement. This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z)
Efficient Real-time Smoke Filtration with 3D LiDAR for Search and Rescue with Autonomous Heterogeneous Robotic Systems [56.838297900091426]
Smoke and dust affect the performance of any mobile robotic platform due to their reliance on onboard perception systems. This paper proposes a novel modular computation filtration pipeline based on intensity and spatial information.
arXiv Detail & Related papers (2023-08-14T16:48:57Z)
Learning-based estimation of in-situ wind speed from underwater acoustics [58.293528982012255]
We introduce a deep learning approach for the retrieval of wind speed time series from underwater acoustics. Our approach bridges data assimilation and learning-based frameworks to benefit both from prior physical knowledge and computational efficiency.
arXiv Detail & Related papers (2022-08-18T15:27:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.