Learning Depth With Very Sparse Supervision
- URL: http://arxiv.org/abs/2003.00752v2
- Date: Thu, 16 Jul 2020 10:01:55 GMT
- Title: Learning Depth With Very Sparse Supervision
- Authors: Antonio Loquercio, Alexey Dosovitskiy, and Davide Scaramuzza
- Abstract summary: This paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment.
We train a specialized global-local network architecture with what would be available to a robot interacting with the environment.
Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches.
- Score: 57.911425589947314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by the astonishing capabilities of natural intelligent agents and
inspired by theories from psychology, this paper explores the idea that
perception gets coupled to 3D properties of the world via interaction with the
environment. Existing works for depth estimation require either massive amounts
of annotated training data or some form of hard-coded geometrical constraint.
This paper explores a new approach to learning depth perception requiring
neither of those. Specifically, we train a specialized global-local network
architecture with what would be available to a robot interacting with the
environment: from extremely sparse depth measurements down to even a single
pixel per image. From a pair of consecutive images, our proposed network
outputs a latent representation of the observer's motion between the images and
a dense depth map. Experiments on several datasets show that, when ground truth
is available even for just one of the image pixels, the proposed network can
learn monocular dense depth estimation up to 22.5% more accurately than
state-of-the-art approaches. We believe that this work, despite its scientific
interest, lays the foundations to learn depth from extremely sparse
supervision, which can be valuable to all robotic systems acting under severe
bandwidth or sensing constraints.
Related papers
- Embodiment: Self-Supervised Depth Estimation Based on Camera Models [17.931220115676258]
Self-supervised methods possess great potential due to no labeling cost.
However, self-supervised learning still has a large gap with supervised learning in 3D reconstruction and depth estimation performance.
By embedding the camera's physical properties into the model, we can calculate depth priors for ground regions and regions connected to the ground.
arXiv Detail & Related papers (2024-08-02T20:40:19Z) - Calibrating Panoramic Depth Estimation for Practical Localization and
Mapping [20.621442016969976]
The absolute depth values of surrounding environments provide crucial cues for various assistive technologies, such as localization, navigation, and 3D structure estimation.
We propose that accurate depth estimated from panoramic images can serve as a powerful and light-weight input for a wide range of downstream tasks requiring 3D information.
arXiv Detail & Related papers (2023-08-27T04:50:05Z) - Self-Guided Instance-Aware Network for Depth Completion and Enhancement [6.319531161477912]
Existing methods directly interpolate the missing depth measurements based on pixel-wise image content and the corresponding neighboring depth values.
We propose a novel self-guided instance-aware network (SG-IANet) that utilize self-guided mechanism to extract instance-level features that is needed for depth restoration.
arXiv Detail & Related papers (2021-05-25T19:41:38Z) - S2R-DepthNet: Learning a Generalizable Depth-specific Structural
Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes.
We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information.
Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z) - Learning Joint 2D-3D Representations for Depth Completion [90.62843376586216]
We design a simple yet effective neural network block that learns to extract joint 2D and 3D features.
Specifically, the block consists of two domain-specific sub-networks that apply 2D convolution on image pixels and continuous convolution on 3D points.
arXiv Detail & Related papers (2020-12-22T22:58:29Z) - Depth by Poking: Learning to Estimate Depth from Self-Supervised
Grasping [6.382990675677317]
We train a neural network model to estimate depth from RGB-D images.
Our network predicts, for each pixel in an input image, the z position that a robot's end effector would reach if it attempted to grasp or poke at the corresponding position.
We show our approach achieves significantly lower root mean squared error than traditional structured light sensors.
arXiv Detail & Related papers (2020-06-16T03:34:26Z) - VisualEchoes: Spatial Image Representation Learning through Echolocation [97.23789910400387]
Several animal species (e.g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation.
We propose a novel interaction-based representation learning framework that learns useful visual features via echolocation.
Our work opens a new path for representation learning for embodied agents, where supervision comes from interacting with the physical world.
arXiv Detail & Related papers (2020-05-04T16:16:58Z) - Distilled Semantics for Comprehensive Scene Understanding from Videos [53.49501208503774]
In this paper, we take an additional step toward holistic scene understanding with monocular cameras by learning depth and motion alongside with semantics.
We address the three tasks jointly by a novel training protocol based on knowledge distillation and self-supervision.
We show that it yields state-of-the-art results for monocular depth estimation, optical flow and motion segmentation.
arXiv Detail & Related papers (2020-03-31T08:52:13Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.