AQUA20: A Benchmark Dataset for Underwater Species Classification under Challenging Conditions
- URL: http://arxiv.org/abs/2506.17455v2
- Date: Mon, 30 Jun 2025 17:27:51 GMT
- Title: AQUA20: A Benchmark Dataset for Underwater Species Classification under Challenging Conditions
- Authors: Taufikur Rahman Fuad, Sabbir Ahmed, Shahriar Ivan,
- Abstract summary: This paper introduces AQUA20, a comprehensive benchmark dataset comprising 8,171 underwater images across 20 marine species.<n>Thirteen state-of-the-art deep learning models were evaluated to benchmark their performance in classifying marine species under challenging conditions.<n>Results show ConvNeXt achieving the best performance, with a Top-3 accuracy of 98.82% and a Top-1 accuracy of 90.69%, as well as the highest overall F1-score of 88.92% with moderately large parameter size.
- Score: 1.2289361708127877
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robust visual recognition in underwater environments remains a significant challenge due to complex distortions such as turbidity, low illumination, and occlusion, which severely degrade the performance of standard vision systems. This paper introduces AQUA20, a comprehensive benchmark dataset comprising 8,171 underwater images across 20 marine species reflecting real-world environmental challenges such as illumination, turbidity, occlusions, etc., providing a valuable resource for underwater visual understanding. Thirteen state-of-the-art deep learning models, including lightweight CNNs (SqueezeNet, MobileNetV2) and transformer-based architectures (ViT, ConvNeXt), were evaluated to benchmark their performance in classifying marine species under challenging conditions. Our experimental results show ConvNeXt achieving the best performance, with a Top-3 accuracy of 98.82% and a Top-1 accuracy of 90.69%, as well as the highest overall F1-score of 88.92% with moderately large parameter size. The results obtained from our other benchmark models also demonstrate trade-offs between complexity and performance. We also provide an extensive explainability analysis using GRAD-CAM and LIME for interpreting the strengths and pitfalls of the models. Our results reveal substantial room for improvement in underwater species recognition and demonstrate the value of AQUA20 as a foundation for future research in this domain. The dataset is publicly available at: https://huggingface.co/datasets/taufiktrf/AQUA20.
Related papers
- FishDet-M: A Unified Large-Scale Benchmark for Robust Fish Detection and CLIP-Guided Model Selection in Diverse Aquatic Visual Domains [1.3791394805787949]
FishDet-M is the largest unified benchmark for fish detection, comprising 13 publicly available datasets spanning diverse aquatic environments.<n>All data are harmonized using COCO-style annotations with both bounding boxes and segmentation masks.<n>FishDet-M establishes a standardized and reproducible platform for evaluating object detection in complex aquatic scenes.
arXiv Detail & Related papers (2025-07-23T18:32:01Z) - Underwater Monocular Metric Depth Estimation: Real-World Benchmarks and Synthetic Fine-Tuning with Vision Foundation Models [0.0]
We present a benchmark of zero-shot and fine-tuned monocular metric depth estimation models on real-world underwater datasets.<n>Our results show that large-scale models trained on terrestrial data (real or synthetic) are effective in in-air settings, but perform poorly underwater.<n>This study presents a detailed evaluation and visualization of monocular metric depth estimation in underwater scenes.
arXiv Detail & Related papers (2025-07-02T21:06:39Z) - USIS16K: High-Quality Dataset for Underwater Salient Instance Segmentation [11.590111778515775]
We introduce USIS16K, a large-scale dataset comprising 16,151 high-resolution underwater images.<n>Each image is annotated with high-quality instance-level salient object masks.<n>We provide benchmark evaluations on underwater object detection and USIS tasks using USIS16K.
arXiv Detail & Related papers (2025-06-24T09:58:01Z) - UWSAM: Segment Anything Model Guided Underwater Instance Segmentation and A Large-scale Benchmark Dataset [62.00529957144851]
We propose a large-scale underwater instance segmentation dataset, UIIS10K, which includes 10,048 images with pixel-level annotations for 10 categories.<n>We then introduce UWSAM, an efficient model designed for automatic and accurate segmentation of underwater instances.<n>We show that our model is effective, achieving significant performance improvements over state-of-the-art methods on multiple underwater instance datasets.
arXiv Detail & Related papers (2025-05-21T14:36:01Z) - Learning Underwater Active Perception in Simulation [51.205673783866146]
Turbidity can jeopardise the whole mission as it may prevent correct visual documentation of the inspected structures.<n>Previous works have introduced methods to adapt to turbidity and backscattering.<n>We propose a simple yet efficient approach to enable high-quality image acquisition of assets in a broad range of water conditions.
arXiv Detail & Related papers (2025-04-23T06:48:38Z) - PIGUIQA: A Physical Imaging Guided Perceptual Framework for Underwater Image Quality Assessment [59.9103803198087]
We propose a Physical Imaging Guided perceptual framework for Underwater Image Quality Assessment (UIQA)<n>By leveraging underwater radiative transfer theory, we integrate physics-based imaging estimations to establish quantitative metrics for these distortions.<n>The proposed model accurately predicts image quality scores and achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-12-20T03:31:45Z) - FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation [65.01601309903971]
We introduce FAFA, a Frequency-Aware Flow-Aided self-supervised framework for 6D pose estimation of unmanned underwater vehicles (UUVs)
Our framework relies solely on the 3D model and RGB images, alleviating the need for any real pose annotations or other-modality data like depths.
We evaluate the effectiveness of FAFA on common underwater object pose benchmarks and showcase significant performance improvements compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-09-25T03:54:01Z) - On Vision Transformers for Classification Tasks in Side-Scan Sonar Imagery [0.0]
Side-scan sonar (SSS) imagery presents unique challenges in the classification of man-made objects on the seafloor.
This paper rigorously compares the performance of ViT models alongside commonly used CNN architectures for binary classification tasks in SSS imagery.
ViT-based models exhibit superior classification performance across f1-score, precision, recall, and accuracy metrics.
arXiv Detail & Related papers (2024-09-18T14:36:50Z) - Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset [60.14089302022989]
Underwater vision tasks often suffer from low segmentation accuracy due to the complex underwater circumstances.
We construct the first large-scale underwater salient instance segmentation dataset (USIS10K)
We propose an Underwater Salient Instance architecture based on Segment Anything Model (USIS-SAM) specifically for the underwater domain.
arXiv Detail & Related papers (2024-06-10T06:17:33Z) - MuLA-GAN: Multi-Level Attention GAN for Enhanced Underwater Visibility [1.9272863690919875]
We introduce MuLA-GAN, a novel approach that leverages the synergistic power of Geneversarative Adrial Networks (GANs) and Multi-Level Attention mechanisms for comprehensive underwater image enhancement.
Our model excels in capturing and preserving intricate details in underwater imagery, essential for various applications.
This work not only addresses a significant research gap in underwater image enhancement but also underscores the pivotal role of Multi-Level Attention in enhancing GANs.
arXiv Detail & Related papers (2023-12-25T07:33:47Z) - Improving Underwater Visual Tracking With a Large Scale Dataset and
Image Enhancement [70.2429155741593]
This paper presents a new dataset and general tracker enhancement method for Underwater Visual Object Tracking (UVOT)
It poses distinct challenges; the underwater environment exhibits non-uniform lighting conditions, low visibility, lack of sharpness, low contrast, camouflage, and reflections from suspended particles.
We propose a novel underwater image enhancement algorithm designed specifically to boost tracking quality.
The method has resulted in a significant performance improvement, of up to 5.0% AUC, of state-of-the-art (SOTA) visual trackers.
arXiv Detail & Related papers (2023-08-30T07:41:26Z) - The Second Monocular Depth Estimation Challenge [93.1678025923996]
The second edition of the Monocular Depth Estimation Challenge (MDEC) was open to methods using any form of supervision.
The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth.
The top supervised submission improved relative F-Score by 27.62%, while the top self-supervised improved it by 16.61%.
arXiv Detail & Related papers (2023-04-14T11:10:07Z) - UID2021: An Underwater Image Dataset for Evaluation of No-reference
Quality Assessment Metrics [11.570496045891465]
Underwater image quality assessment (UIQA) is of high significance in underwater visual perception and image/video processing.
To address this issue, we establish a large-scale underwater image dataset, dubbed UID 2021, for evaluating no-reference UIQA metrics.
The constructed dataset contains 60 multiply degraded underwater images collected from various sources, covering six common underwater scenes.
arXiv Detail & Related papers (2022-04-19T11:28:08Z) - A Realistic Fish-Habitat Dataset to Evaluate Algorithms for Underwater
Visual Analysis [2.6476746128312194]
We present DeepFish as a benchmark suite with a large-scale dataset to train and test methods for several computer vision tasks.
The dataset consists of approximately 40 thousand images collected underwater from 20 greenhabitats in the marine-environments of tropical Australia.
Our experiments provide an in-depth analysis of the dataset characteristics, and the performance evaluation of several state-of-the-art approaches.
arXiv Detail & Related papers (2020-08-28T12:20:59Z) - Simultaneous Enhancement and Super-Resolution of Underwater Imagery for
Improved Visual Perception [17.403133838762447]
We introduce and tackle the simultaneous enhancement and super-resolution (SESR) problem for underwater robot vision.
We present Deep SESR, a residual-in-residual network-based generative model that can learn to restore perceptual image qualities at 2x, 3x, or 4x higher spatial resolution.
arXiv Detail & Related papers (2020-02-04T07:07:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.