Progressive Multi-scale Fusion Network for RGB-D Salient Object
Detection
- URL: http://arxiv.org/abs/2106.03941v1
- Date: Mon, 7 Jun 2021 20:02:39 GMT
- Title: Progressive Multi-scale Fusion Network for RGB-D Salient Object
Detection
- Authors: Guangyu Ren, Yanchu Xie, Tianhong Dai, Tania Stathaki
- Abstract summary: We discuss about the advantages of the so-called progressive multi-scale fusion method and propose a mask-guided feature aggregation module.
The proposed framework can effectively combine the two features of different modalities and alleviate the impact of erroneous depth features.
We further introduce a mask-guided refinement module(MGRM) to complement the high-level semantic features and reduce the irrelevant features from multi-scale fusion.
- Score: 9.099589602551575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Salient object detection(SOD) aims at locating the most significant object
within a given image. In recent years, great progress has been made in applying
SOD on many vision tasks. The depth map could provide additional spatial prior
and boundary cues to boost the performance. Combining the depth information
with image data obtained from standard visual cameras has been widely used in
recent SOD works, however, introducing depth information in a suboptimal fusion
strategy may have negative influence in the performance of SOD. In this paper,
we discuss about the advantages of the so-called progressive multi-scale fusion
method and propose a mask-guided feature aggregation module(MGFA). The proposed
framework can effectively combine the two features of different modalities and,
furthermore, alleviate the impact of erroneous depth features, which are
inevitably caused by the variation of depth quality. We further introduce a
mask-guided refinement module(MGRM) to complement the high-level semantic
features and reduce the irrelevant features from multi-scale fusion, leading to
an overall refinement of detection. Experiments on five challenging benchmarks
demonstrate that the proposed method outperforms 11 state-of-the-art methods
under different evaluation metrics.
Related papers
- Depth-discriminative Metric Learning for Monocular 3D Object Detection [14.554132525651868]
We introduce a novel metric learning scheme that encourages the model to extract depth-discriminative features regardless of the visual attributes.
Our method consistently improves the performance of various baselines by 23.51% and 5.78% on average.
arXiv Detail & Related papers (2024-01-02T07:34:09Z) - MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection [54.52102265418295]
We propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection.
For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features.
For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module, which exploits image semantics to rectify the confidence of detection candidates.
arXiv Detail & Related papers (2023-07-18T11:26:02Z) - Depth Estimation Matters Most: Improving Per-Object Depth Estimation for
Monocular 3D Detection and Tracking [47.59619420444781]
Approaches to monocular 3D perception including detection and tracking often yield inferior performance when compared to LiDAR-based techniques.
We propose a multi-level fusion method that combines different representations (RGB and pseudo-LiDAR) and temporal information across multiple frames for objects (tracklets) to enhance per-object depth estimation.
arXiv Detail & Related papers (2022-06-08T03:37:59Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - MSFNet:Multi-scale features network for monocular depth estimation [0.0]
Multi-scale Features Network (MSFNet) consists of Enhanced Diverse Attention (EDA) module and Upsample-Stage Fusion (USF) module.
EDA module employs the spatial attention method to learn significant spatial information.
USF module complements low-level detail information with high-level semantic information to improve the predicted effect.
arXiv Detail & Related papers (2021-07-14T01:38:29Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - CMA-Net: A Cascaded Mutual Attention Network for Light Field Salient
Object Detection [17.943924748737622]
We propose CMA-Net, which consists of two novel cascaded mutual attention modules aiming at fusing the high level features from the modalities of all-in-focus and depth.
Our proposed CMA-Net outperforms 30 SOD methods (by a large margin) on two widely applied light field benchmark datasets.
arXiv Detail & Related papers (2021-05-03T15:32:12Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.