Middle-level Fusion for Lightweight RGB-D Salient Object Detection
- URL: http://arxiv.org/abs/2104.11543v1
- Date: Fri, 23 Apr 2021 11:37:15 GMT
- Title: Middle-level Fusion for Lightweight RGB-D Salient Object Detection
- Authors: Nianchang Huang, Qiang Zhang, Jungong Han
- Abstract summary: A novel lightweight RGB-D SOD model is presented in this paper.
With IMFF and L modules incorporated in the middle-level fusion structure, our proposed model has only 3.9M parameters and runs at 33 FPS.
The experimental results on several benchmark datasets verify the effectiveness and superiority of the proposed method over some state-of-the-art methods.
- Score: 81.43951906434175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most existing RGB-D salient object detection (SOD) models require large
computational costs and memory consumption to accurately detect the salient
objects. This limits the real-life applications of these RGB-D SOD models. To
address this issue, a novel lightweight RGB-D SOD model is presented in this
paper. Different from most existing models which usually employ the two-stream
or single-stream structure, we propose to employ the middle-level fusion
structure for designing lightweight RGB-D SOD model, due to the fact that the
middle-level fusion structure can simultaneously exploit the modality-shared
and modality-specific information as the two-stream structure and can
significantly reduce the network's parameters as the single-stream structure.
Based on this structure, a novel information-aware multi-modal feature fusion
(IMFF) module is first designed to effectively capture the cross-modal
complementary information. Then, a novel lightweight feature-level and
decision-level feature fusion (LFDF) module is designed to aggregate the
feature-level and the decision-level saliency information in different stages
with less parameters. With IMFF and LFDF modules incorporated in the
middle-level fusion structure, our proposed model has only 3.9M parameters and
runs at 33 FPS. Furthermore, the experimental results on several benchmark
datasets verify the effectiveness and superiority of the proposed method over
some state-of-the-art methods.
Related papers
- RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical
Flow and Scene Flow Estimation [43.358140897849616]
In this paper, we incorporate RGB images, Point clouds and Events for joint optical flow and scene flow estimation with our proposed multi-stage multimodal fusion model, RPEFlow.
Experiments on both synthetic and real datasets show that our model outperforms the existing state-of-the-art by a wide margin.
arXiv Detail & Related papers (2023-09-26T17:23:55Z) - Can SAM Boost Video Super-Resolution? [78.29033914169025]
We propose a simple yet effective module -- SAM-guidEd refinEment Module (SEEM)
This light-weight plug-in module is specifically designed to leverage the attention mechanism for the generation of semantic-aware feature.
We apply our SEEM to two representative methods, EDVR and BasicVSR, resulting in consistently improved performance with minimal implementation effort.
arXiv Detail & Related papers (2023-05-11T02:02:53Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Trear: Transformer-based RGB-D Egocentric Action Recognition [38.20137500372927]
We propose a textbfTransformer-based RGB-D textbfegocentric textbfaction textbfrecognition framework, called Trear.
It consists of two modules, inter-frame attention encoder and mutual-attentional fusion block.
arXiv Detail & Related papers (2021-01-05T19:59:30Z) - RGB-D Salient Object Detection with Cross-Modality Modulation and
Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD)
The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z) - A Single Stream Network for Robust and Real-time RGB-D Salient Object
Detection [89.88222217065858]
We design a single stream network to use the depth map to guide early fusion and middle fusion between RGB and depth.
This model is 55.5% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 times 384$ image.
arXiv Detail & Related papers (2020-07-14T04:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.