Related papers: SAMSOD: Rethinking SAM Optimization for RGB-T Salient Object Detection

SAMSOD: Rethinking SAM Optimization for RGB-T Salient Object Detection

URL: http://arxiv.org/abs/2510.03689v1
Date: Sat, 04 Oct 2025 06:02:12 GMT
Title: SAMSOD: Rethinking SAM Optimization for RGB-T Salient Object Detection
Authors: Zhengyi Liu, Xinrui Wang, Xianyong Fang, Zhengzheng Tu, Linbo Wang,
Abstract summary: RGB-T salient object detection (SOD) aims to segment attractive objects by combining RGB and thermal infrared images.<n>We propose a model called textitSAMSOD, which utilizes unimodal supervision to enhance the learning of non-dominant modality.
Score: 15.774524474470233
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: RGB-T salient object detection (SOD) aims to segment attractive objects by combining RGB and thermal infrared images. To enhance performance, the Segment Anything Model has been fine-tuned for this task. However, the imbalance convergence of two modalities and significant gradient difference between high- and low- activations are ignored, thereby leaving room for further performance enhancement. In this paper, we propose a model called \textit{SAMSOD}, which utilizes unimodal supervision to enhance the learning of non-dominant modality and employs gradient deconfliction to reduce the impact of conflicting gradients on model convergence. The method also leverages two decoupled adapters to separately mask high- and low-activation neurons, emphasizing foreground objects by enhancing background learning. Fundamental experiments on RGB-T SOD benchmark datasets and generalizability experiments on scribble supervised RGB-T SOD, fully supervised RGB-D SOD datasets and full-supervised RGB-D rail surface defect detection all demonstrate the effectiveness of our proposed method.

Related papers

Beyond RGB and Events: Enhancing Object Detection under Adverse Lighting with Monocular Normal Maps [6.240947520777607]
We introduce NRE-Net, a novel multi-modal detection framework.<n>It fuses three complementary modalities: monocularly predicted surface normal maps, RGB images, and event streams.<n>NRE-Net significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-08-04T07:19:20Z)
RGBX-DiffusionDet: A Framework for Multi-Modal RGB-X Object Detection Using DiffusionDet [0.0]
RGBX-DiffusionDet is an object detection framework extending the DiffusionDet model.<n>It fuses the heterogeneous 2D data (X) with RGB imagery via an adaptive multimodal encoder.
arXiv Detail & Related papers (2025-05-05T11:39:51Z)
KAN-SAM: Kolmogorov-Arnold Network Guided Segment Anything Model for RGB-T Salient Object Detection [35.52055285209549]
We propose a novel prompt learning-based RGB-T SOD method, named KAN-SAM, which reveals the potential of visual foundational models for RGB-T SOD tasks.<n>Specifically, we extend Segment Anything Model 2 (SAM2) for RGB-T SOD by introducing thermal features as guiding prompts through efficient and accurate Kolmogorov-Arnold Network (KAN) adapters.<n>We also introduce a mutually exclusive random masking strategy to reduce reliance on RGB data and improve generalization.
arXiv Detail & Related papers (2025-04-08T10:07:02Z)
Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection [67.02804741856512]
We propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection.<n>Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions.
arXiv Detail & Related papers (2025-01-25T06:21:06Z)
Mirror Complementary Transformer Network for RGB-thermal Salient Object Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair. In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD. Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z)
Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images [89.81919625224103]
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images. We present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection.
arXiv Detail & Related papers (2022-01-01T03:02:27Z)
DUT-LFSaliency: Versatile Dataset and Light Field-to-RGB Saliency Detection [104.50425501764806]
We introduce a large-scale dataset to enable versatile applications for light field saliency detection. We present an asymmetrical two-stream model consisting of the Focal stream and RGB stream. Experiments demonstrate that our Focal stream achieves state-of-the-arts performance.
arXiv Detail & Related papers (2020-12-30T11:53:27Z)
Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection. Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods. We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z)
Siamese Network for RGB-D Salient Object Detection and Beyond [113.30063105890041]
A novel framework is proposed to learn from both RGB and depth inputs through a shared network backbone. Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector. We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models.
arXiv Detail & Related papers (2020-08-26T06:01:05Z)
Cascade Graph Neural Networks for RGB-D Salient Object Detection [41.57218490671026]
We study the problem of salient object detection (SOD) for RGB-D images using both color and depth information. We introduce Cascade Graph Neural Networks(Cas-Gnn),a unified framework which is capable of comprehensively distilling and reasoning the mutual benefits between these two data sources. Cas-Gnn achieves significantly better performance than all existing RGB-DSOD approaches on several widely-used benchmarks.
arXiv Detail & Related papers (2020-08-07T10:59:04Z)
Cross-Modal Weighting Network for RGB-D Salient Object Detection [76.0965123893641]
We propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD. Specifically, three RGB-depth interaction modules, named CMW-L, CMW-M and CMW-H, are developed to deal with respectively low-, middle- and high-level cross-modal information fusion. CMWNet consistently outperforms 15 state-of-the-art RGB-D SOD methods on seven popular benchmarks.
arXiv Detail & Related papers (2020-07-09T16:01:44Z)
Synergistic saliency and depth prediction for RGB-D saliency detection [76.27406945671379]
Existing RGB-D saliency datasets are small, which may lead to overfitting and limited generalization for diverse scenarios. We propose a semi-supervised system for RGB-D saliency detection that can be trained on smaller RGB-D saliency datasets without saliency ground truth.
arXiv Detail & Related papers (2020-07-03T14:24:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.