CMA-Net: A Cascaded Mutual Attention Network for Light Field Salient
Object Detection
- URL: http://arxiv.org/abs/2105.00949v1
- Date: Mon, 3 May 2021 15:32:12 GMT
- Title: CMA-Net: A Cascaded Mutual Attention Network for Light Field Salient
Object Detection
- Authors: Yi Zhang, Lu Zhang, Wassim Hamidouche and Olivier Deforges
- Abstract summary: We propose CMA-Net, which consists of two novel cascaded mutual attention modules aiming at fusing the high level features from the modalities of all-in-focus and depth.
Our proposed CMA-Net outperforms 30 SOD methods (by a large margin) on two widely applied light field benchmark datasets.
- Score: 17.943924748737622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the past few years, numerous deep learning methods have been proposed to
address the task of segmenting salient objects from RGB (all-in-focus) images.
However, these approaches depending on single modality fail to achieve the
state-of-the-art performance on widely used light field salient object
detection (SOD) datasets, which collect large-scale natural images and provide
multiple modalities such as multi-view, micro-lens images and depth maps. Most
recently proposed light field SOD methods have acquired improving detecting
accuracy, yet still predict rough objects' structures and perform slow
inference speed. To this end, we propose CMA-Net, which consists of two novel
cascaded mutual attention modules aiming at fusing the high level features from
the modalities of all-in-focus and depth. Our proposed CMA-Net outperforms 30
SOD methods (by a large margin) on two widely applied light field benchmark
datasets. Besides, the proposed CMA-Net can run at a speed of 53 fps, thus
being much faster than the state-of-the-art multi-modal SOD methods. Extensive
quantitative and qualitative experiments illustrate both the effectiveness and
efficiency of our CMA-Net, inspiring future development of multi-modal learning
for both the RGB-D and light field SOD.
Related papers
- HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness [2.341385717236931]
We propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection.
Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies.
Our HiDAnet performs favorably over the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2023-01-18T10:00:59Z) - A lightweight multi-scale context network for salient object detection
in optical remote sensing images [16.933770557853077]
We propose a multi-scale context network, namely MSCNet, for salient object detection in optical RSIs.
Specifically, a multi-scale context extraction module is adopted to address the scale variation of salient objects.
In order to accurately detect complete salient objects in complex backgrounds, we design an attention-based pyramid feature aggregation mechanism.
arXiv Detail & Related papers (2022-05-18T14:32:47Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - Depth-Cooperated Trimodal Network for Video Salient Object Detection [13.727763221832532]
We propose a depth-operated triOD network called DCTNet for video salient object detection (VS)
To this end, we first generate depth from RGB frames, and then propose an approach to treat the three modalities unequally.
We also introduce a refinement fusion module (RFM) to suppress noises in each modality and select useful information dynamically for further feature refinement.
arXiv Detail & Related papers (2022-02-12T13:04:16Z) - RRNet: Relational Reasoning Network with Parallel Multi-scale Attention
for Salient Object Detection in Optical Remote Sensing Images [82.1679766706423]
Salient object detection (SOD) for optical remote sensing images (RSIs) aims at locating and extracting visually distinctive objects/regions from the optical RSIs.
We propose a relational reasoning network with parallel multi-scale attention for SOD in optical RSIs.
Our proposed RRNet outperforms the existing state-of-the-art SOD competitors both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-10-27T07:18:32Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Progressive Multi-scale Fusion Network for RGB-D Salient Object
Detection [9.099589602551575]
We discuss about the advantages of the so-called progressive multi-scale fusion method and propose a mask-guided feature aggregation module.
The proposed framework can effectively combine the two features of different modalities and alleviate the impact of erroneous depth features.
We further introduce a mask-guided refinement module(MGRM) to complement the high-level semantic features and reduce the irrelevant features from multi-scale fusion.
arXiv Detail & Related papers (2021-06-07T20:02:39Z) - A Parallel Down-Up Fusion Network for Salient Object Detection in
Optical Remote Sensing Images [82.87122287748791]
We propose a novel Parallel Down-up Fusion network (PDF-Net) for salient object detection in optical remote sensing images (RSIs)
It takes full advantage of the in-path low- and high-level features and cross-path multi-resolution features to distinguish diversely scaled salient objects and suppress the cluttered backgrounds.
Experiments on the ORSSD dataset demonstrate that the proposed network is superior to the state-of-the-art approaches both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-10-02T05:27:57Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z) - A Single Stream Network for Robust and Real-time RGB-D Salient Object
Detection [89.88222217065858]
We design a single stream network to use the depth map to guide early fusion and middle fusion between RGB and depth.
This model is 55.5% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 times 384$ image.
arXiv Detail & Related papers (2020-07-14T04:40:14Z) - Multi-level Cross-modal Interaction Network for RGB-D Salient Object
Detection [3.581367375462018]
We propose a novel Multi-level Cross-modal Interaction Network (MCINet) for RGB-D based salient object detection (SOD)
Our MCI-Net includes two key components: 1) a cross-modal feature learning network, which is used to learn the high-level features for the RGB images and depth cues, effectively enabling the correlations between the two sources to be exploited; and 2) a multi-level interactive integration network, which integrates multi-level cross-modal features to boost the SOD performance.
arXiv Detail & Related papers (2020-07-10T02:21:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.