Adaptive Context-Aware Multi-Modal Network for Depth Completion
- URL: http://arxiv.org/abs/2008.10833v1
- Date: Tue, 25 Aug 2020 06:00:06 GMT
- Title: Adaptive Context-Aware Multi-Modal Network for Depth Completion
- Authors: Shanshan Zhao, Mingming Gong, Huan Fu, and Dacheng Tao
- Abstract summary: We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
- Score: 107.15344488719322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth completion aims to recover a dense depth map from the sparse depth data
and the corresponding single RGB image. The observed pixels provide the
significant guidance for the recovery of the unobserved pixels' depth. However,
due to the sparsity of the depth data, the standard convolution operation,
exploited by most of existing methods, is not effective to model the observed
contexts with depth values. To address this issue, we propose to adopt the
graph propagation to capture the observed spatial contexts. Specifically, we
first construct multiple graphs at different scales from observed pixels. Since
the graph structure varies from sample to sample, we then apply the attention
mechanism on the propagation, which encourages the network to model the
contextual information adaptively. Furthermore, considering the mutli-modality
of input data, we exploit the graph propagation on the two modalities
respectively to extract multi-modal representations. Finally, we introduce the
symmetric gated fusion strategy to exploit the extracted multi-modal features
effectively. The proposed strategy preserves the original information for one
modality and also absorbs complementary information from the other through
learning the adaptive gating weights. Our model, named Adaptive Context-Aware
Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two
benchmarks, {\it i.e.}, KITTI and NYU-v2, and at the same time has fewer
parameters than latest models. Our code is available at:
\url{https://github.com/sshan-zhao/ACMNet}.
Related papers
- HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness [2.341385717236931]
We propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection.
Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies.
Our HiDAnet performs favorably over the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2023-01-18T10:00:59Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Monocular Depth Distribution Alignment with Low Computation [15.05244258071472]
We model the majority of accuracy contrast between light-weight networks and heavy-weight networks.
By perceiving the difference of depth features between every two regions, DANet tends to predict a reasonable scene structure.
Thanks to the alignment of depth distribution shape and scene depth range, DANet sharply alleviates the distribution drift, and achieves a comparable performance with prior heavy-weight methods.
arXiv Detail & Related papers (2022-03-09T06:18:26Z) - Multi-View Stereo Network with attention thin volume [0.0]
We propose an efficient multi-view stereo (MVS) network for infering depth value from multiple RGB images.
We introduce the self-attention mechanism to fully aggregate the dominant information from input images.
We also introduce the group-wise correlation to feature aggregation, which greatly reduces the memory and calculation burden.
arXiv Detail & Related papers (2021-10-16T11:51:23Z) - RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.