Monocular Depth Estimation with Self-supervised Instance Adaptation
- URL: http://arxiv.org/abs/2004.05821v1
- Date: Mon, 13 Apr 2020 08:32:03 GMT
- Title: Monocular Depth Estimation with Self-supervised Instance Adaptation
- Authors: Robert McCraith, Lukas Neumann, Andrew Zisserman, Andrea Vedaldi
- Abstract summary: In robotics applications, multiple views of a scene may or may not be available, depend-ing on the actions of the robot.
We propose a new approach that extends any off-the-shelf self-supervised monocular depth reconstruction system to usemore than one image at test time.
- Score: 138.0231868286184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in self-supervised learning havedemonstrated that it is
possible to learn accurate monoculardepth reconstruction from raw video data,
without using any 3Dground truth for supervision. However, in robotics
applications,multiple views of a scene may or may not be available, depend-ing
on the actions of the robot, switching between monocularand multi-view
reconstruction. To address this mixed setting,we proposed a new approach that
extends any off-the-shelfself-supervised monocular depth reconstruction system
to usemore than one image at test time. Our method builds on astandard prior
learned to perform monocular reconstruction,but uses self-supervision at test
time to further improve thereconstruction accuracy when multiple images are
available.When used to update the correct components of the model, thisapproach
is highly-effective. On the standard KITTI bench-mark, our self-supervised
method consistently outperformsall the previous methods with an average 25%
reduction inabsolute error for the three common setups (monocular, stereoand
monocular+stereo), and comes very close in accuracy whencompared to the
fully-supervised state-of-the-art methods.
Related papers
- Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection [66.74183705987276]
We introduce a framework to improve the camera-only apprentice model, including an apprentice-friendly multi-modal expert and temporal-fusion-friendly distillation supervision.
With those improvements, our camera-only apprentice VCD-A sets new state-of-the-art on nuScenes with a score of 63.1% NDS.
arXiv Detail & Related papers (2023-10-24T09:29:26Z) - Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image [85.91935485902708]
We show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models.
We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models.
Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2023-07-20T16:14:23Z) - Self-Supervised Monocular Depth Estimation with Self-Reference
Distillation and Disparity Offset Refinement [15.012694052674899]
We propose two novel ideas to improve self-supervised monocular depth estimation.
We use a parameter-optimized model as the teacher updated as the training epochs to provide additional supervision.
We leverage the contextual consistency between high-scale and low-scale features to obtain multiscale disparity offsets.
arXiv Detail & Related papers (2023-02-20T06:28:52Z) - State of the Art in Dense Monocular Non-Rigid 3D Reconstruction [100.9586977875698]
3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics.
This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views.
arXiv Detail & Related papers (2022-10-27T17:59:53Z) - RAFT-MSF: Self-Supervised Monocular Scene Flow using Recurrent Optimizer [21.125470798719967]
We introduce a self-supervised monocular scene flow method that substantially improves the accuracy over the previous approaches.
Based on RAFT, a state-of-the-art optical flow model, we design a new decoder to iteratively update 3D motion fields and disparity maps simultaneously.
Our method achieves state-of-the-art accuracy among all self-supervised monocular scene flow methods, improving accuracy by 34.2%.
arXiv Detail & Related papers (2022-05-03T15:43:57Z) - SGM3D: Stereo Guided Monocular 3D Object Detection [62.11858392862551]
We propose a stereo-guided monocular 3D object detection network, termed SGM3D.
We exploit robust 3D features extracted from stereo images to enhance the features learned from the monocular image.
Our method can be integrated into many other monocular approaches to boost performance without introducing any extra computational cost.
arXiv Detail & Related papers (2021-12-03T13:57:14Z) - Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning.
We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z) - SynDistNet: Self-Supervised Monocular Fisheye Camera Distance Estimation
Synergized with Semantic Segmentation for Autonomous Driving [37.50089104051591]
State-of-the-art self-supervised learning approaches for monocular depth estimation usually suffer from scale ambiguity.
This paper introduces a novel multi-task learning strategy to improve self-supervised monocular distance estimation on fisheye and pinhole camera images.
arXiv Detail & Related papers (2020-08-10T10:52:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.