MARIS: Marine Open-Vocabulary Instance Segmentation with Geometric Enhancement and Semantic Alignment
- URL: http://arxiv.org/abs/2510.15398v2
- Date: Thu, 23 Oct 2025 07:18:58 GMT
- Title: MARIS: Marine Open-Vocabulary Instance Segmentation with Geometric Enhancement and Semantic Alignment
- Authors: Bingyu Li, Feiyu Wang, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li,
- Abstract summary: We introduce textbfMARIS (underlineMarine Open-Vocabulary underlineInstance underlineSegmentation), the first large-scale fine-grained benchmark for underwater Open-Vocabulary (OV) segmentation.<n>Our framework consistently outperforms existing OV baselines both In-Domain and Cross-Domain setting.
- Score: 56.88334234553316
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most existing underwater instance segmentation approaches are constrained by close-vocabulary prediction, limiting their ability to recognize novel marine categories. To support evaluation, we introduce \textbf{MARIS} (\underline{Mar}ine Open-Vocabulary \underline{I}nstance \underline{S}egmentation), the first large-scale fine-grained benchmark for underwater Open-Vocabulary (OV) segmentation, featuring a limited set of seen categories and diverse unseen categories. Although OV segmentation has shown promise on natural images, our analysis reveals that transfer to underwater scenes suffers from severe visual degradation (e.g., color attenuation) and semantic misalignment caused by lack underwater class definitions. To address these issues, we propose a unified framework with two complementary components. The Geometric Prior Enhancement Module (\textbf{GPEM}) leverages stable part-level and structural cues to maintain object consistency under degraded visual conditions. The Semantic Alignment Injection Mechanism (\textbf{SAIM}) enriches language embeddings with domain-specific priors, mitigating semantic ambiguity and improving recognition of unseen categories. Experiments show that our framework consistently outperforms existing OV baselines both In-Domain and Cross-Domain setting on MARIS, establishing a strong foundation for future underwater perception research.
Related papers
- Exploring the Underwater World Segmentation without Extra Training [55.291219073365546]
We introduce textbfAquaOV255, the first large-scale and fine-grained underwater segmentation dataset.<n>We also present textbfEarth2Ocean, a training-free OV segmentation framework.
arXiv Detail & Related papers (2025-11-11T07:22:56Z) - Expose Camouflage in the Water: Underwater Camouflaged Instance Segmentation and Dataset [76.92197418745822]
camouflaged instance segmentation (CIS) faces greater challenges in accurately segmenting objects that blend closely with their surroundings.<n>Traditional camouflaged instance segmentation methods, trained on terrestrial-dominated datasets with limited underwater samples, may exhibit inadequate performance in underwater scenes.<n>We introduce the first underwater camouflaged instance segmentation dataset, UCIS4K, which comprises 3,953 images of camouflaged marine organisms with instance-level annotations.
arXiv Detail & Related papers (2025-10-20T14:34:51Z) - Open-Vocabulary Camouflaged Object Segmentation with Cascaded Vision Language Models [35.947354809849166]
Open-Vocabulary Camouflaged Object seeks to segment and classify camouflaged objects from arbitrary categories.<n>Recent approaches typically adopt a two-stage paradigm: first segmenting objects, then classifying the segmented regions.<n>This paper introduces a novel VLM-guided cascaded framework to address these issues in OVCOS.
arXiv Detail & Related papers (2025-06-24T04:16:41Z) - Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic Segmentation [8.068623902839368]
Open-Vocabulary semantic segmentation (OVSS) and domain generalization in semantic segmentation (DGSS) highlight a subtle complementarity.<n>OV-DGSS aims to generate pixel-level masks for unseen categories while maintaining robustness across unseen domains.<n>We introduce Vireo, a novel single-stage framework for OV-DGSS that unifies the strengths of OVSS and DGSS for the first time.
arXiv Detail & Related papers (2025-06-11T15:54:47Z) - Marine Saliency Segmenter: Object-Focused Conditional Diffusion with Region-Level Semantic Knowledge Distillation [44.50637633194709]
Marine Saliency (MSS) plays a pivotal role in various vision-based marine exploration tasks.<n>We propose DiffMSS, a novel marine saliency segmenter based on the diffusion model.<n>We develop the dedicated deterministic consensus sampling to suppress overconfident missegmentations.
arXiv Detail & Related papers (2025-04-03T08:31:36Z) - Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation [44.501770535446624]
Key challenge in panoptic domain adaptation is reducing the domain gap between a labeled source and an unlabeled target domain.<n>We focus on incorporating instance-level adaptation via a novel cross-domain mixing strategy IMix.<n>We present an end-to-end model incorporating these two mechanisms called LIDAPS, achieving state-of-the-art results on all popular panoptic UDA benchmarks.
arXiv Detail & Related papers (2024-04-04T20:42:49Z) - Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided
Relation Alignment and Adaptation [98.51938442785179]
Incremental few-shot semantic segmentation aims to incrementally extend a semantic segmentation model to novel classes.
This task faces a severe semantic-aliasing issue between base and novel classes due to data imbalance.
We propose the Semantic-guided Relation Alignment and Adaptation (SRAA) method that fully considers the guidance of prior semantic information.
arXiv Detail & Related papers (2023-05-18T10:40:52Z) - Betrayed by Captions: Joint Caption Grounding and Generation for Open
Vocabulary Instance Segmentation [80.48979302400868]
We focus on open vocabulary instance segmentation to expand a segmentation model to classify and segment instance-level novel categories.
Previous approaches have relied on massive caption datasets and complex pipelines to establish one-to-one mappings between image regions and captions in nouns.
We devise a joint textbfCaption Grounding and Generation (CGG) framework, which incorporates a novel grounding loss that only focuses on matching object to improve learning efficiency.
arXiv Detail & Related papers (2023-01-02T18:52:12Z) - Amplitude Spectrum Transformation for Open Compound Domain Adaptive
Semantic Segmentation [62.68759523116924]
Open compound domain adaptation (OCDA) has emerged as a practical adaptation setting.
We propose a novel feature space Amplitude Spectrum Transformation (AST)
arXiv Detail & Related papers (2022-02-09T05:40:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.