ActiveZero: Mixed Domain Learning for Active Stereovision with Zero
Annotation
- URL: http://arxiv.org/abs/2112.02772v1
- Date: Mon, 6 Dec 2021 04:03:47 GMT
- Title: ActiveZero: Mixed Domain Learning for Active Stereovision with Zero
Annotation
- Authors: Isabella Liu, Edward Yang, Jianyu Tao, Rui Chen, Xiaoshuai Zhang, Qing
Ran, Zhu Liu, Hao Su
- Abstract summary: We present a new framework, ActiveZero, which is a mixed domain learning solution for active stereovision systems.
We show how the method can be trained end-to-end and that each module is important for attaining the end result.
- Score: 21.33158815473845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional depth sensors generate accurate real world depth estimates that
surpass even the most advanced learning approaches trained only on simulation
domains. Since ground truth depth is readily available in the simulation domain
but quite difficult to obtain in the real domain, we propose a method that
leverages the best of both worlds. In this paper we present a new framework,
ActiveZero, which is a mixed domain learning solution for active stereovision
systems that requires no real world depth annotation. First, we demonstrate the
transferability of our method to out-of-distribution real data by using a mixed
domain learning strategy. In the simulation domain, we use a combination of
supervised disparity loss and self-supervised losses on a shape primitives
dataset. By contrast, in the real domain, we only use self-supervised losses on
a dataset that is out-of-distribution from either training simulation data or
test real data. Second, our method introduces a novel self-supervised loss
called temporal IR reprojection to increase the robustness and accuracy of our
reprojections in hard-to-perceive regions. Finally, we show how the method can
be trained end-to-end and that each module is important for attaining the end
result. Extensive qualitative and quantitative evaluations on real data
demonstrate state of the art results that can even beat a commercial depth
sensor.
Related papers
- One-Shot Domain Adaptive and Generalizable Semantic Segmentation with
Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data.
Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation.
We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z) - Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - What Stops Learning-based 3D Registration from Working in the Real
World? [53.68326201131434]
This work identifies the sources of 3D point cloud registration failures, analyze the reasons behind them, and propose solutions.
Ultimately, this translates to a best-practice 3D registration network (BPNet), constituting the first learning-based method able to handle previously-unseen objects in real-world data.
Our model generalizes to real data without any fine-tuning, reaching an accuracy of up to 67% on point clouds of unseen objects obtained with a commercial sensor.
arXiv Detail & Related papers (2021-11-19T19:24:27Z) - Towards Optimal Strategies for Training Self-Driving Perception Models
in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone.
Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator.
We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z) - Occlusion-aware Unsupervised Learning of Depth from 4-D Light Fields [50.435129905215284]
We present an unsupervised learning-based depth estimation method for 4-D light field processing and analysis.
Based on the basic knowledge of the unique geometry structure of light field data, we explore the angular coherence among subsets of the light field views to estimate depth maps.
Our method can significantly shrink the performance gap between the previous unsupervised method and supervised ones, and produce depth maps with comparable accuracy to traditional methods with obviously reduced computational cost.
arXiv Detail & Related papers (2021-06-06T06:19:50Z) - Learning a Domain-Agnostic Visual Representation for Autonomous Driving
via Contrastive Loss [25.798361683744684]
Domain-Agnostic Contrastive Learning (DACL) is a two-stage unsupervised domain adaptation framework with cyclic adversarial training and contrastive loss.
Our proposed approach achieves better performance in the monocular depth estimation task compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-10T07:06:03Z) - Sim2Real for Self-Supervised Monocular Depth and Segmentation [7.376636976924]
Image-based learning methods for autonomous vehicle perception tasks require large quantities of labelled, real data in order to properly train without overfitting.
Recent advances in domain adaptation have indicated that a shared latent space assumption can help to bridge the gap between the simulation and real domains.
We demonstrate that a twin VAE-based architecture with a shared latent space and auxiliary decoders is able to bridge the sim2real gap without requiring any paired, ground-truth data in the real domain.
arXiv Detail & Related papers (2020-12-01T03:25:02Z) - Exploring the Capabilities and Limits of 3D Monocular Object Detection
-- A Study on Simulation and Real World Data [0.0]
3D object detection based on monocular camera data is key enabler for autonomous driving.
Recent deep learning methods show promising results to recover depth information from single images.
In this paper, we evaluate the performance of a 3D object detection pipeline which is parameterizable with different depth estimation configurations.
arXiv Detail & Related papers (2020-05-15T09:05:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.