Unsupervised Monocular Depth Learning with Integrated Intrinsics and
Spatio-Temporal Constraints
- URL: http://arxiv.org/abs/2011.01354v3
- Date: Fri, 13 Aug 2021 19:40:49 GMT
- Title: Unsupervised Monocular Depth Learning with Integrated Intrinsics and
Spatio-Temporal Constraints
- Authors: Kenny Chen, Alexandra Pogue, Brett T. Lopez, Ali-akbar Agha-mohammadi,
and Ankur Mehta
- Abstract summary: This work presents an unsupervised learning framework that is able to predict at-scale depth maps and egomotion.
Our results demonstrate strong performance when compared to the current state-of-the-art on multiple sequences of the KITTI driving dataset.
- Score: 61.46323213702369
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth inference has gained tremendous attention from researchers in
recent years and remains as a promising replacement for expensive
time-of-flight sensors, but issues with scale acquisition and implementation
overhead still plague these systems. To this end, this work presents an
unsupervised learning framework that is able to predict at-scale depth maps and
egomotion, in addition to camera intrinsics, from a sequence of monocular
images via a single network. Our method incorporates both spatial and temporal
geometric constraints to resolve depth and pose scale factors, which are
enforced within the supervisory reconstruction loss functions at training time.
Only unlabeled stereo sequences are required for training the weights of our
single-network architecture, which reduces overall implementation overhead as
compared to previous methods. Our results demonstrate strong performance when
compared to the current state-of-the-art on multiple sequences of the KITTI
driving dataset and can provide faster training times with its reduced network
complexity.
Related papers
- Self-STORM: Deep Unrolled Self-Supervised Learning for Super-Resolution Microscopy [55.2480439325792]
We introduce deep unrolled self-supervised learning, which alleviates the need for such data by training a sequence-specific, model-based autoencoder.
Our proposed method exceeds the performance of its supervised counterparts.
arXiv Detail & Related papers (2024-03-25T17:40:32Z) - GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for
Indoor Scenes [47.76269541664071]
This paper tackles the challenges of self-supervised monocular depth estimation in indoor scenes caused by large rotation between frames and low texture.
We obtain coarse camera poses from monocular sequences through multi-view geometry to deal with the former.
To soften the effect of the low texture, we combine the global reasoning of vision transformers with an overfitting-aware, iterative self-distillation mechanism.
arXiv Detail & Related papers (2023-09-26T17:59:57Z) - Spatio-Temporal Recurrent Networks for Event-Based Optical Flow
Estimation [47.984368369734995]
We introduce a novel recurrent encoding-decoding neural network architecture for event-based optical flow estimation.
The network is end-to-end trained with self-supervised learning on the Multi-Vehicle Stereo Event Camera dataset.
We have shown that it outperforms all the existing state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-09-10T13:37:37Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z) - Learning Monocular Visual Odometry via Self-Supervised Long-Term
Modeling [106.15327903038705]
Monocular visual odometry (VO) suffers severely from error accumulation during frame-to-frame pose estimation.
We present a self-supervised learning method for VO with special consideration for consistency over longer sequences.
We train the networks with purely self-supervised losses, including a cycle consistency loss that mimics the loop closure module in geometric VO.
arXiv Detail & Related papers (2020-07-21T17:59:01Z) - Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [36.414471128890284]
We tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning.
Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples.
We propose a novel system that explicitly disentangles scale from the network estimation.
arXiv Detail & Related papers (2020-04-03T00:28:09Z) - DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised
Representation Learning [65.94499390875046]
DeFeat-Net is an approach to simultaneously learn a cross-domain dense feature representation.
Our technique is able to outperform the current state-of-the-art with around 10% reduction in all error measures.
arXiv Detail & Related papers (2020-03-30T13:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.