Related papers: UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding

UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding

URL: http://arxiv.org/abs/2510.18262v1
Date: Tue, 21 Oct 2025 03:32:15 GMT
Title: UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding
Authors: Da Zhang, Chenggang Rong, Bingyu Li, Feiyu Wang, Zhiyuan Zhao, Junyu Gao, Xuelong Li,
Abstract summary: Large vision-language models (VLMs) have achieved remarkable success in natural scene understanding.<n>Underwater imagery presents unique challenges including severe light attenuation, color distortion, and suspended particle scattering.<n>We introduce UWBench, a benchmark specifically designed for underwater vision-language understanding.
Score: 54.16709436340606
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large vision-language models (VLMs) have achieved remarkable success in natural scene understanding, yet their application to underwater environments remains largely unexplored. Underwater imagery presents unique challenges including severe light attenuation, color distortion, and suspended particle scattering, while requiring specialized knowledge of marine ecosystems and organism taxonomy. To bridge this gap, we introduce UWBench, a comprehensive benchmark specifically designed for underwater vision-language understanding. UWBench comprises 15,003 high-resolution underwater images captured across diverse aquatic environments, encompassing oceans, coral reefs, and deep-sea habitats. Each image is enriched with human-verified annotations including 15,281 object referring expressions that precisely describe marine organisms and underwater structures, and 124,983 question-answer pairs covering diverse reasoning capabilities from object recognition to ecological relationship understanding. The dataset captures rich variations in visibility, lighting conditions, and water turbidity, providing a realistic testbed for model evaluation. Based on UWBench, we establish three comprehensive benchmarks: detailed image captioning for generating ecologically informed scene descriptions, visual grounding for precise localization of marine organisms, and visual question answering for multimodal reasoning about underwater environments. Extensive experiments on state-of-the-art VLMs demonstrate that underwater understanding remains challenging, with substantial room for improvement. Our benchmark provides essential resources for advancing vision-language research in underwater contexts and supporting applications in marine science, ecological monitoring, and autonomous underwater exploration. Our code and benchmark will be available.

Related papers

Exploring the Underwater World Segmentation without Extra Training [55.291219073365546]
We introduce textbfAquaOV255, the first large-scale and fine-grained underwater segmentation dataset.<n>We also present textbfEarth2Ocean, a training-free OV segmentation framework.
arXiv Detail & Related papers (2025-11-11T07:22:56Z)
NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding [60.76337064425815]
We study the underwater scene understanding methods, which aim to achieve automated underwater exploration.<n>NautData is a dataset containing 1.45 M image-text pairs supporting eight underwater scene understanding tasks.<n>We propose a plug-and-play vision feature enhancement (VFE) module, which explicitly restores clear underwater information.
arXiv Detail & Related papers (2025-10-31T14:00:35Z)
Expose Camouflage in the Water: Underwater Camouflaged Instance Segmentation and Dataset [76.92197418745822]
camouflaged instance segmentation (CIS) faces greater challenges in accurately segmenting objects that blend closely with their surroundings.<n>Traditional camouflaged instance segmentation methods, trained on terrestrial-dominated datasets with limited underwater samples, may exhibit inadequate performance in underwater scenes.<n>We introduce the first underwater camouflaged instance segmentation dataset, UCIS4K, which comprises 3,953 images of camouflaged marine organisms with instance-level annotations.
arXiv Detail & Related papers (2025-10-20T14:34:51Z)
OceanGym: A Benchmark Environment for Underwater Embodied Agents [69.56465775825275]
OceanGym is the first comprehensive benchmark for ocean underwater embodied agents.<n>It is designed to advance AI in one of the most demanding real-world environments.<n>By providing a high-fidelity, rigorously designed platform, OceanGym establishes a testbed for developing robust embodied AI.
arXiv Detail & Related papers (2025-09-30T17:09:32Z)
DEEP-SEA: Deep-Learning Enhancement for Environmental Perception in Submerged Aquatics [5.543187582839764]
Continuous and reliable underwater monitoring is essential for assessing marine biodiversity, detecting ecological changes and autonomous exploration.<n>Underwater environments present significant challenges due to light scattering, absorption and turbidity, which degrade image clarity and distort colour information.<n>We propose DEEP-SEA, a novel deep learning-based underwater image restoration model to enhance both low- and high-frequency information while preserving spatial structures.
arXiv Detail & Related papers (2025-08-18T11:07:26Z)
Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments [57.59857784298534]
We propose an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images.<n>This method enables robust identification of revisited areas, estimation of rigid transformations, and downstream analysis of ecosystem changes.
arXiv Detail & Related papers (2025-03-06T05:13:19Z)
FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation [65.01601309903971]
We introduce FAFA, a Frequency-Aware Flow-Aided self-supervised framework for 6D pose estimation of unmanned underwater vehicles (UUVs) Our framework relies solely on the 3D model and RGB images, alleviating the need for any real pose annotations or other-modality data like depths. We evaluate the effectiveness of FAFA on common underwater object pose benchmarks and showcase significant performance improvements compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-09-25T03:54:01Z)
UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation [5.596432047035205]
Underwater monocular depth estimation serves as the foundation for tasks such as 3D reconstruction of underwater scenes. Existing methods fail to consider the unique characteristics of underwater environments. In this paper, an end-to-end learning framework for underwater monocular depth estimation called UMono is presented.
arXiv Detail & Related papers (2024-07-25T07:52:11Z)
Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset [60.14089302022989]
Underwater vision tasks often suffer from low segmentation accuracy due to the complex underwater circumstances. We construct the first large-scale underwater salient instance segmentation dataset (USIS10K) We propose an Underwater Salient Instance architecture based on Segment Anything Model (USIS-SAM) specifically for the underwater domain.
arXiv Detail & Related papers (2024-06-10T06:17:33Z)
Deep Learning Innovations for Underwater Waste Detection: An In-Depth Analysis [0.0]
This paper conducts a comprehensive review of state-of-the-art architectures and on the existing datasets to establish a baseline for submerged waste and trash detection. The primary goal remains to establish the benchmark of the object localization techniques to be leveraged by advanced underwater sensors and autonomous underwater vehicles.
arXiv Detail & Related papers (2024-05-28T15:51:18Z)
Virtual Underwater Datasets for Autonomous Inspections [0.0]
This study builds a bespoke dataset from photographs of items captured in a laboratory environment. Generative Adversarial Networks (GANs) were utilised to translate the laboratory object dataset into the underwater domain. The resulting images closely resembled the real underwater environment when compared with real-world underwater ship hull images.
arXiv Detail & Related papers (2022-09-13T14:06:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.