Marine Saliency Segmenter: Object-Focused Conditional Diffusion with Region-Level Semantic Knowledge Distillation
- URL: http://arxiv.org/abs/2504.02391v2
- Date: Sun, 01 Jun 2025 10:20:14 GMT
- Title: Marine Saliency Segmenter: Object-Focused Conditional Diffusion with Region-Level Semantic Knowledge Distillation
- Authors: Laibin Chang, Yunke Wang, JiaXing Huang, Longxiang Deng, Bo Du, Chang Xu,
- Abstract summary: Marine Saliency (MSS) plays a pivotal role in various vision-based marine exploration tasks.<n>We propose DiffMSS, a novel marine saliency segmenter based on the diffusion model.<n>We develop the dedicated deterministic consensus sampling to suppress overconfident missegmentations.
- Score: 44.50637633194709
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Marine Saliency Segmentation (MSS) plays a pivotal role in various vision-based marine exploration tasks. However, existing marine segmentation techniques face the dilemma of object mislocalization and imprecise boundaries due to the complex underwater environment. Meanwhile, despite the impressive performance of diffusion models in visual segmentation, there remains potential to further leverage contextual semantics to enhance feature learning of region-level salient objects, thereby improving segmentation outcomes. Building on this insight, we propose DiffMSS, a novel marine saliency segmenter based on the diffusion model, which utilizes semantic knowledge distillation to guide the segmentation of marine salient objects. Specifically, we design a region-word similarity matching mechanism to identify salient terms at the word level from the text descriptions. These high-level semantic features guide the conditional feature learning network in generating salient and accurate diffusion conditions with semantic knowledge distillation. To further refine the segmentation of fine-grained structures in unique marine organisms, we develop the dedicated consensus deterministic sampling to suppress overconfident missegmentations. Comprehensive experiments demonstrate the superior performance of DiffMSS over state-of-the-art methods in both quantitative and qualitative evaluations.
Related papers
- Geospatial-Reasoning-Driven Vocabulary-Agnostic Remote Sensing Semantic Segmentation [13.743073097114461]
Open-vocabulary semantic segmentation has emerged as a promising research direction in remote sensing.<n>We propose a Geospatial Reasoning Chain-of-Thought (GR-CoT) framework to guide open-vocabulary segmentation models toward precise mapping.
arXiv Detail & Related papers (2026-02-09T02:09:21Z) - Expose Camouflage in the Water: Underwater Camouflaged Instance Segmentation and Dataset [76.92197418745822]
camouflaged instance segmentation (CIS) faces greater challenges in accurately segmenting objects that blend closely with their surroundings.<n>Traditional camouflaged instance segmentation methods, trained on terrestrial-dominated datasets with limited underwater samples, may exhibit inadequate performance in underwater scenes.<n>We introduce the first underwater camouflaged instance segmentation dataset, UCIS4K, which comprises 3,953 images of camouflaged marine organisms with instance-level annotations.
arXiv Detail & Related papers (2025-10-20T14:34:51Z) - MARIS: Marine Open-Vocabulary Instance Segmentation with Geometric Enhancement and Semantic Alignment [56.88334234553316]
We introduce textbfMARIS (underlineMarine Open-Vocabulary underlineInstance underlineSegmentation), the first large-scale fine-grained benchmark for underwater Open-Vocabulary (OV) segmentation.<n>Our framework consistently outperforms existing OV baselines both In-Domain and Cross-Domain setting.
arXiv Detail & Related papers (2025-10-17T07:50:58Z) - Investigating the Effect of Spatial Context on Multi-Task Sea Ice Segmentation [1.0291625571470187]
This study investigates the impact of spatial context on the segmentation of sea ice concentration, stage of development, and floe size using a multi-task segmentation model.<n>We implement Atrous Spatial Pyramid Pooling with varying atrous rates to control the receptive field size of convolutional operations.<n>Our findings indicate that smaller receptive fields excel for high-resolution Sentinel-1 data, while medium receptive fields yield better performances for stage of development segmentation and larger receptive fields often lead to diminished performances.
arXiv Detail & Related papers (2025-07-28T04:03:36Z) - LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance [56.474856189865946]
Large multi-modal models (LMMs) struggle with inaccurate segmentation and hallucinated comprehension.<n>We propose LIRA, a framework that capitalizes on the complementary relationship between visual comprehension and segmentation.<n>LIRA achieves state-of-the-art performance in both segmentation and comprehension tasks.
arXiv Detail & Related papers (2025-07-08T07:46:26Z) - Multi-Domain Features Guided Supervised Contrastive Learning for Radar Target Detection [8.706031869122917]
Existing solutions either model sea clutter for detection or extract target features based on clutter-target echo differences, including statistical and deep features.<n>We propose a multi-domain features guided supervised contrastive learning (MDFG_SCL) method, which integrates statistical features derived from multi-domain differences with deep features obtained through supervised contrastive learning.<n>Experiments conducted on real-world datasets demonstrate that the proposed shallow-to-deep detector not only achieves effective identification of small maritime targets but also maintains superior detection performance across varying sea conditions.
arXiv Detail & Related papers (2024-12-17T07:33:07Z) - Diffusion Features to Bridge Domain Gap for Semantic Segmentation [2.8616666231199424]
This paper investigates the approach that leverages the sampling and fusion techniques to harness the features of diffusion models efficiently.
By leveraging the strength of text-to-image generation capability, we introduce a new training framework designed to implicitly learn posterior knowledge from it.
arXiv Detail & Related papers (2024-06-02T15:33:46Z) - Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation [51.66997548477913]
We propose a novel feature-level consistency learning framework named Density-Descending Feature Perturbation (DDFP)
Inspired by the low-density separation assumption in semi-supervised learning, our key insight is that feature density can shed a light on the most promising direction for the segmentation classifier to explore.
The proposed DDFP outperforms other designs on feature-level perturbations and shows state of the art performances on both Pascal VOC and Cityscapes dataset.
arXiv Detail & Related papers (2024-03-11T06:59:05Z) - Attention-guided Feature Distillation for Semantic Segmentation [8.344263189293578]
This paper showcases the efficacy of a simple yet powerful method for utilizing refined feature maps to transfer attention.<n>The proposed Attention-guided Feature Distillation (AttnFD) method, employs the Convolutional Block Attention Module (CBAM)<n>It achieves state-of-the-art results in terms of improving the mean Intersection over Union (mIoU) of the student network on the PascalVoc 2012, Cityscapes, COCO, and CamVid datasets.
arXiv Detail & Related papers (2024-03-08T16:57:47Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - Semantic-aware Texture-Structure Feature Collaboration for Underwater
Image Enhancement [58.075720488942125]
Underwater image enhancement has become an attractive topic as a significant technology in marine engineering and aquatic robotics.
We develop an efficient and compact enhancement network in collaboration with a high-level semantic-aware pretrained model.
We also apply the proposed algorithm to the underwater salient object detection task to reveal the favorable semantic-aware ability for high-level vision tasks.
arXiv Detail & Related papers (2022-11-19T07:50:34Z) - Semantic-Guided Representation Enhancement for Self-supervised Monocular
Trained Depth Estimation [39.845944724079814]
Self-supervised depth estimation has shown its great effectiveness in producing high quality depth maps given only image sequences as input.
However, its performance usually drops when estimating on border areas or objects with thin structures due to the limited depth representation ability.
We propose a semantic-guided depth representation enhancement method, which promotes both local and global depth feature representations.
arXiv Detail & Related papers (2020-12-15T02:24:57Z) - MARNet: Multi-Abstraction Refinement Network for 3D Point Cloud Analysis [9.34612743192798]
Existing deep learning methods fail to exploit different granularity of information due to limited interaction between features.
We propose Multi-Abstraction Refinement Network (MARNet) that ensures an effective exchange of information between multi-level features.
We empirically show the effectiveness of MARNet in terms of state-of-the-art results on two challenging tasks: Shape classification and Coarse-to-fine grained semantic segmentation.
arXiv Detail & Related papers (2020-11-02T12:07:35Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z) - Panoptic Feature Fusion Net: A Novel Instance Segmentation Paradigm for
Biomedical and Biological Images [91.41909587856104]
We present a Panoptic Feature Fusion Net (PFFNet) that unifies the semantic and instance features in this work.
Our proposed PFFNet contains a residual attention feature fusion mechanism to incorporate the instance prediction with the semantic features.
It outperforms several state-of-the-art methods on various biomedical and biological datasets.
arXiv Detail & Related papers (2020-02-15T09:19:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.