Exploring the Underwater World Segmentation without Extra Training
- URL: http://arxiv.org/abs/2511.07923v1
- Date: Wed, 12 Nov 2025 01:28:47 GMT
- Title: Exploring the Underwater World Segmentation without Extra Training
- Authors: Bingyu Li, Tao Huo, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li,
- Abstract summary: We introduce textbfAquaOV255, the first large-scale and fine-grained underwater segmentation dataset.<n>We also present textbfEarth2Ocean, a training-free OV segmentation framework.
- Score: 55.291219073365546
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate segmentation of marine organisms is vital for biodiversity monitoring and ecological assessment, yet existing datasets and models remain largely limited to terrestrial scenes. To bridge this gap, we introduce \textbf{AquaOV255}, the first large-scale and fine-grained underwater segmentation dataset containing 255 categories and over 20K images, covering diverse categories for open-vocabulary (OV) evaluation. Furthermore, we establish the first underwater OV segmentation benchmark, \textbf{UOVSBench}, by integrating AquaOV255 with five additional underwater datasets to enable comprehensive evaluation. Alongside, we present \textbf{Earth2Ocean}, a training-free OV segmentation framework that transfers terrestrial vision--language models (VLMs) to underwater domains without any additional underwater training. Earth2Ocean consists of two core components: a Geometric-guided Visual Mask Generator (\textbf{GMG}) that refines visual features via self-similarity geometric priors for local structure perception, and a Category-visual Semantic Alignment (\textbf{CSA}) module that enhances text embeddings through multimodal large language model reasoning and scene-aware template construction. Extensive experiments on the UOVSBench benchmark demonstrate that Earth2Ocean achieves significant performance improvement on average while maintaining efficient inference.
Related papers
- NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding [60.76337064425815]
We study the underwater scene understanding methods, which aim to achieve automated underwater exploration.<n>NautData is a dataset containing 1.45 M image-text pairs supporting eight underwater scene understanding tasks.<n>We propose a plug-and-play vision feature enhancement (VFE) module, which explicitly restores clear underwater information.
arXiv Detail & Related papers (2025-10-31T14:00:35Z) - Expose Camouflage in the Water: Underwater Camouflaged Instance Segmentation and Dataset [76.92197418745822]
camouflaged instance segmentation (CIS) faces greater challenges in accurately segmenting objects that blend closely with their surroundings.<n>Traditional camouflaged instance segmentation methods, trained on terrestrial-dominated datasets with limited underwater samples, may exhibit inadequate performance in underwater scenes.<n>We introduce the first underwater camouflaged instance segmentation dataset, UCIS4K, which comprises 3,953 images of camouflaged marine organisms with instance-level annotations.
arXiv Detail & Related papers (2025-10-20T14:34:51Z) - MARIS: Marine Open-Vocabulary Instance Segmentation with Geometric Enhancement and Semantic Alignment [56.88334234553316]
We introduce textbfMARIS (underlineMarine Open-Vocabulary underlineInstance underlineSegmentation), the first large-scale fine-grained benchmark for underwater Open-Vocabulary (OV) segmentation.<n>Our framework consistently outperforms existing OV baselines both In-Domain and Cross-Domain setting.
arXiv Detail & Related papers (2025-10-17T07:50:58Z) - USIS16K: High-Quality Dataset for Underwater Salient Instance Segmentation [11.590111778515775]
We introduce USIS16K, a large-scale dataset comprising 16,151 high-resolution underwater images.<n>Each image is annotated with high-quality instance-level salient object masks.<n>We provide benchmark evaluations on underwater object detection and USIS tasks using USIS16K.
arXiv Detail & Related papers (2025-06-24T09:58:01Z) - Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation [110.02397462607449]
We propose a large-scale underwater instance segmentation dataset, UIIS10K, which includes 10,048 images with pixel-level annotations for 10 categories.<n>We then introduce UWSAM, an efficient model designed for automatic and accurate segmentation of underwater instances.<n>We show that our model is effective, achieving significant performance improvements over state-of-the-art methods on multiple underwater instance datasets.
arXiv Detail & Related papers (2025-05-21T14:36:01Z) - Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset [60.14089302022989]
Underwater vision tasks often suffer from low segmentation accuracy due to the complex underwater circumstances.
We construct the first large-scale underwater salient instance segmentation dataset (USIS10K)
We propose an Underwater Salient Instance architecture based on Segment Anything Model (USIS-SAM) specifically for the underwater domain.
arXiv Detail & Related papers (2024-06-10T06:17:33Z) - SVAM: Saliency-guided Visual Attention Modeling by Autonomous Underwater
Robots [16.242924916178282]
This paper presents a holistic approach to saliency-guided visual attention modeling (SVAM) for use by autonomous underwater robots.
Our proposed model, named SVAM-Net, integrates deep visual features at various scales and semantics for effective salient object detection (SOD) in natural underwater images.
arXiv Detail & Related papers (2020-11-12T08:17:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.