Mask-adaptive Gated Convolution and Bi-directional Progressive Fusion
Network for Depth Completion
- URL: http://arxiv.org/abs/2401.07439v1
- Date: Mon, 15 Jan 2024 02:58:06 GMT
- Title: Mask-adaptive Gated Convolution and Bi-directional Progressive Fusion
Network for Depth Completion
- Authors: Tingxuan Huang and Jiacheng Miao and Shizhuo Deng and Tong and Dongyue
Chen
- Abstract summary: We propose a new model for depth completion based on an encoder-decoder structure.
Our model introduces two key components: the Mask-adaptive Gated Convolution architecture and the Bi-directional Progressive Fusion module.
We achieve remarkable performance in completing depth maps and outperformed existing approaches in terms of accuracy and reliability.
- Score: 3.8558637038709622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth completion is a critical task for handling depth images with missing
pixels, which can negatively impact further applications. Recent approaches
have utilized Convolutional Neural Networks (CNNs) to reconstruct depth images
with the assistance of color images. However, vanilla convolution has
non-negligible drawbacks in handling missing pixels. To solve this problem, we
propose a new model for depth completion based on an encoder-decoder structure.
Our model introduces two key components: the Mask-adaptive Gated Convolution
(MagaConv) architecture and the Bi-directional Progressive Fusion (BP-Fusion)
module. The MagaConv architecture is designed to acquire precise depth features
by modulating convolution operations with iteratively updated masks, while the
BP-Fusion module progressively integrates depth and color features, utilizing
consecutive bi-directional fusion structures in a global perspective. Extensive
experiments on popular benchmarks, including NYU-Depth V2, DIML, and SUN RGB-D,
demonstrate the superiority of our model over state-of-the-art methods. We
achieved remarkable performance in completing depth maps and outperformed
existing approaches in terms of accuracy and reliability.
Related papers
- SDformer: Efficient End-to-End Transformer for Depth Completion [5.864200786548098]
Depth completion aims to predict dense depth maps with sparse depth measurements from a depth sensor.
Currently, Convolutional Neural Network (CNN) based models are the most popular methods applied to depth completion tasks.
To overcome the drawbacks of CNNs, a more effective and powerful method has been presented, which is an adaptive self-attention setting sequence-to-sequence model.
arXiv Detail & Related papers (2024-09-12T15:52:08Z) - AGG-Net: Attention Guided Gated-convolutional Network for Depth Image
Completion [1.8820731605557168]
We propose a new model for depth image completion based on the Attention Guided Gated-convolutional Network (AGG-Net)
In the encoding stage, an Attention Guided Gated-Convolution (AG-GConv) module is proposed to realize the fusion of depth and color features at different scales.
In the decoding stage, an Attention Guided Skip Connection (AG-SC) module is presented to avoid introducing too many depth-irrelevant features to the reconstruction.
arXiv Detail & Related papers (2023-09-04T14:16:08Z) - CompletionFormer: Depth Completion with Convolutions and Vision
Transformers [0.0]
This paper proposes a Joint Convolutional Attention and Transformer block (JCAT), which deeply couples the convolutional attention layer and Vision Transformer into one block, as the basic unit to construct our depth completion model in a pyramidal structure.
Our CompletionFormer outperforms state-of-the-art CNNs-based methods on the outdoor KITTI Depth Completion benchmark and indoor NYUv2 dataset, achieving significantly higher efficiency (nearly 1/3 FLOPs) compared to pure Transformer-based methods.
arXiv Detail & Related papers (2023-04-25T17:59:47Z) - DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view
Structure from Motion [9.294501649791016]
Two-view structure from motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM (vSLAM)
We formulate the two-view SfM problem as a maximum likelihood estimation (MLE) and solve it with the proposed framework, denoted as DeepMLE.
Our method significantly outperforms the state-of-the-art end-to-end two-view SfM approaches in accuracy and generalization capability.
arXiv Detail & Related papers (2022-10-11T15:07:25Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal
Fusion with Depth Guidance [49.94504248096527]
We propose a Depth-Guided Outpainting Network (DGONet) to model the feature representations of different modalities.
Two components are designed to implement: 1) The Multimodal Learning Module produces unique depth and RGB feature representations from perspectives of different modal characteristics.
We specially design an additional constraint strategy consisting of Cross-modal Loss and Edge Loss to enhance ambiguous contours and expedite reliable content generation.
arXiv Detail & Related papers (2022-04-12T06:06:50Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - RigNet: Repetitive Image Guided Network for Depth Completion [20.66405067066299]
Recent approaches mainly focus on image guided learning to predict dense results.
blurry image guidance and object structures in depth still impede the performance of image guided frameworks.
We explore a repetitive design in our image guided network to sufficiently and gradually recover depth values.
Our method achieves state-of-the-art result on the NYUv2 dataset and ranks 1st on the KITTI benchmark at the time of submission.
arXiv Detail & Related papers (2021-07-29T08:00:33Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - Dual Pixel Exploration: Simultaneous Depth Estimation and Image
Restoration [77.1056200937214]
We study the formation of the DP pair which links the blur and the depth information.
We propose an end-to-end DDDNet (DP-based Depth and De Network) to jointly estimate the depth and restore the image.
arXiv Detail & Related papers (2020-12-01T06:53:57Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.