Modal-Adaptive Gated Recoding Network for RGB-D Salient Object Detection
- URL: http://arxiv.org/abs/2108.06281v1
- Date: Fri, 13 Aug 2021 15:08:21 GMT
- Title: Modal-Adaptive Gated Recoding Network for RGB-D Salient Object Detection
- Authors: Feng Dong, Jinchao Zhu, Xian Fang, Qiu Yu
- Abstract summary: We propose a novel gated recoding network (GRNet) to evaluate the information validity of the two modes.
A perception encoder is adopted to extract multi-level single-modal features.
A modal-adaptive gate unit is proposed to suppress the invalid information and transfer the effective modal features to the recoding mixer and the hybrid branch decoder.
- Score: 2.9153096940947796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The multi-modal salient object detection model based on RGB-D information has
better robustness in the real world. However, it remains nontrivial to better
adaptively balance effective multi-modal information in the feature fusion
phase. In this letter, we propose a novel gated recoding network (GRNet) to
evaluate the information validity of the two modes, and balance their
influence. Our framework is divided into three phases: perception phase,
recoding mixing phase and feature integration phase. First, A perception
encoder is adopted to extract multi-level single-modal features, which lays the
foundation for multi-modal semantic comparative analysis. Then, a
modal-adaptive gate unit (MGU) is proposed to suppress the invalid information
and transfer the effective modal features to the recoding mixer and the hybrid
branch decoder. The recoding mixer is responsible for recoding and mixing the
balanced multi-modal information. Finally, the hybrid branch decoder completes
the multi-level feature integration under the guidance of an optional edge
guidance stream (OEGS). Experiments and analysis on eight popular benchmarks
verify that our framework performs favorably against 9 state-of-art methods.
Related papers
- Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection [10.353412441955436]
We propose the GL-DMNet, a novel dual mutual learning network with global-local awareness.
We present a position mutual fusion module and a channel mutual fusion module to exploit the interdependencies among different modalities.
Our proposed GL-DMNet performs better than 24 RGB-D SOD methods, achieving an average improvement of 3%.
arXiv Detail & Related papers (2025-01-03T05:37:54Z) - Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection [70.84835546732738]
RGB-Thermal Salient Object Detection aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images.
Traditional encoder-decoder architectures may not have adequately considered the robustness against noise originating from defective modalities.
We propose the ConTriNet, a robust Confluent Triple-Flow Network employing a Divide-and-Conquer strategy.
arXiv Detail & Related papers (2024-12-02T14:44:39Z) - FoRA: Low-Rank Adaptation Model beyond Multimodal Siamese Network [19.466279425330857]
We propose a novel multimodal object detector, named Low-rank Modal Adaptors (LMA) with a shared backbone.
Our work was submitted to ACM MM in April 2024, but was rejected.
arXiv Detail & Related papers (2024-07-23T02:27:52Z) - Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet)
AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition.
Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z) - X Modality Assisting RGBT Object Tracking [36.614908357546035]
We propose a novel X Modality Assisting Network (X-Net) to shed light on the impact of the fusion paradigm.
To tackle the feature learning hurdles stemming from significant differences between RGB and thermal modalities, a plug-and-play pixel-level generation module (PGM) is proposed.
We also propose a feature-level interaction module (FIM) that incorporates a mixed feature interaction transformer and a spatial-dimensional feature translation strategy.
arXiv Detail & Related papers (2023-12-27T05:38:54Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - TransCMD: Cross-Modal Decoder Equipped with Transformer for RGB-D
Salient Object Detection [86.94578023985677]
In this work, we rethink this task from the perspective of global information alignment and transformation.
Specifically, the proposed method (TransCMD) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path.
Experimental results on seven RGB-D SOD benchmark datasets demonstrate that a simple two-stream encoder-decoder framework can surpass the state-of-the-art purely CNN-based methods.
arXiv Detail & Related papers (2021-12-04T15:45:34Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - LC3Net: Ladder context correlation complementary network for salient
object detection [0.32116198597240836]
We propose a novel ladder context correlation complementary network (LC3Net)
FCB is a filterable convolution block to assist the automatic collection of information on the diversity of initial features.
DCM is a dense cross module to facilitate the intimate aggregation of different levels of features.
BCD is a bidirectional compression decoder to help the progressive shrinkage of multi-scale features.
arXiv Detail & Related papers (2021-10-21T03:12:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.