UAMD-Net: A Unified Adaptive Multimodal Neural Network for Dense Depth
Completion
- URL: http://arxiv.org/abs/2204.07791v1
- Date: Sat, 16 Apr 2022 12:49:50 GMT
- Title: UAMD-Net: A Unified Adaptive Multimodal Neural Network for Dense Depth
Completion
- Authors: Guancheng Chen, Junli Lin and Huabiao Qin
- Abstract summary: We propose a novel multimodal neural network, namely UAMD-Net, for dense depth completion based on fusion of binocular stereo matching and the weak constrain from the sparse point clouds.
Our method produces robust results and outperforms other state-of-the-art methods.
- Score: 0.618778092044887
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Depth prediction is a critical problem in robotics applications especially
autonomous driving. Generally, depth prediction based on binocular stereo
matching and fusion of monocular image and laser point cloud are two mainstream
methods. However, the former usually suffers from overfitting while building
cost volume, and the latter has a limited generalization due to the lack of
geometric constraint. To solve these problems, we propose a novel multimodal
neural network, namely UAMD-Net, for dense depth completion based on fusion of
binocular stereo matching and the weak constrain from the sparse point clouds.
Specifically, the sparse point clouds are converted to sparse depth map and
sent to the multimodal feature encoder (MFE) with binocular image, constructing
a cross-modal cost volume. Then, it will be further processed by the multimodal
feature aggregator (MFA) and the depth regression layer. Furthermore, the
existing multimodal methods ignore the problem of modal dependence, that is,
the network will not work when a certain modal input has a problem. Therefore,
we propose a new training strategy called Modal-dropout which enables the
network to be adaptively trained with multiple modal inputs and inference with
specific modal inputs. Benefiting from the flexible network structure and
adaptive training method, our proposed network can realize unified training
under various modal input conditions. Comprehensive experiments conducted on
KITTI depth completion benchmark demonstrate that our method produces robust
results and outperforms other state-of-the-art methods.
Related papers
- Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning [69.00997996453842]
We propose a deep Reinforcement Learning approach to learn a joint Admission Control and Resource Allocation policy for virtual network embedding.
We show that HRL-ACRA outperforms state-of-the-art baselines in terms of both the acceptance ratio and long-term average revenue.
arXiv Detail & Related papers (2024-06-25T07:42:30Z) - Exploring Missing Modality in Multimodal Egocentric Datasets [89.76463983679058]
We introduce a novel concept -Missing Modality Token (MMT)-to maintain performance even when modalities are absent.
Our method mitigates the performance loss, reducing it from its original $sim 30%$ drop to only $sim 10%$ when half of the test set is modal-incomplete.
arXiv Detail & Related papers (2024-01-21T11:55:42Z) - SwinDepth: Unsupervised Depth Estimation using Monocular Sequences via
Swin Transformer and Densely Cascaded Network [29.798579906253696]
It is challenging to acquire dense ground truth depth labels for supervised training, and the unsupervised depth estimation using monocular sequences emerges as a promising alternative.
In this paper, we employ a convolution-free Swin Transformer as an image feature extractor so that the network can capture both local geometric features and global semantic features for depth estimation.
Also, we propose a Densely Cascaded Multi-scale Network (DCMNet) that connects every feature map directly with another from different scales via a top-down cascade pathway.
arXiv Detail & Related papers (2023-01-17T06:01:46Z) - Non-parametric Depth Distribution Modelling based Depth Inference for
Multi-view Stereo [43.415242967722804]
Recent cost volume pyramid based deep neural networks have unlocked the potential of efficiently leveraging high-resolution images for depth inference from multi-view stereo.
In general, those approaches assume that the depth of each pixel follows a unimodal distribution.
We propose constructing the cost volume by non-parametric depth distribution modeling to handle pixels with unimodal and multi-modal distributions.
arXiv Detail & Related papers (2022-05-08T05:13:04Z) - Routing with Self-Attention for Multimodal Capsule Networks [108.85007719132618]
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework.
To adapt the capsules to large-scale input data, we propose a novel routing by self-attention mechanism that selects relevant capsules.
This allows not only for robust training with noisy video data, but also to scale up the size of the capsule network compared to traditional routing methods.
arXiv Detail & Related papers (2021-12-01T19:01:26Z) - Efficient Real-Time Image Recognition Using Collaborative Swarm of UAVs
and Convolutional Networks [9.449650062296824]
We present a strategy aiming at distributing inference requests to a swarm of resource-constrained UAVs that classifies captured images on-board.
We formulate the model as an optimization problem that minimizes the latency between acquiring images and making the final decisions.
We introduce an online solution, namely DistInference, to find the layers placement strategy that gives the best latency among the available UAVs.
arXiv Detail & Related papers (2021-07-09T19:47:02Z) - Deep Networks and the Multiple Manifold Problem [15.144495799445824]
We study the multiple manifold problem, a binary classification task modeled on applications in machine vision, in which a deep fully-connected neural network is trained to separate two low-dimensional submanifolds of the unit sphere.
We prove for a simple manifold configuration that when the network depth $L$ is large relative to certain geometric and statistical properties of the data, the network width grows as a sufficiently large in $L$.
Our analysis demonstrates concrete benefits of depth and width in the context of a practically-motivated model problem.
arXiv Detail & Related papers (2020-08-25T19:20:00Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Deep Multi-Task Learning for Cooperative NOMA: System Design and
Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL)
We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.