Embodiment: Self-Supervised Depth Estimation Based on Camera Models
- URL: http://arxiv.org/abs/2408.01565v2
- Date: Thu, 29 Aug 2024 01:32:17 GMT
- Title: Embodiment: Self-Supervised Depth Estimation Based on Camera Models
- Authors: Jinchang Zhang, Praveen Kumar Reddy, Xue-Iuan Wong, Yiannis Aloimonos, Guoyu Lu,
- Abstract summary: Self-supervised methods possess great potential due to no labeling cost.
However, self-supervised learning still has a large gap with supervised learning in 3D reconstruction and depth estimation performance.
By embedding the camera's physical properties into the model, we can calculate depth priors for ground regions and regions connected to the ground.
- Score: 17.931220115676258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Depth estimation is a critical topic for robotics and vision-related tasks. In monocular depth estimation, in comparison with supervised learning that requires expensive ground truth labeling, self-supervised methods possess great potential due to no labeling cost. However, self-supervised learning still has a large gap with supervised learning in 3D reconstruction and depth estimation performance. Meanwhile, scaling is also a major issue for monocular unsupervised depth estimation, which commonly still needs ground truth scale from GPS, LiDAR, or existing maps to correct. In the era of deep learning, existing methods primarily rely on exploring image relationships to train unsupervised neural networks, while the physical properties of the camera itself such as intrinsics and extrinsics are often overlooked. These physical properties are not just mathematical parameters; they are embodiments of the camera's interaction with the physical world. By embedding these physical properties into the deep learning model, we can calculate depth priors for ground regions and regions connected to the ground based on physical principles, providing free supervision signals without the need for additional sensors. This approach is not only easy to implement but also enhances the effects of all unsupervised methods by embedding the camera's physical properties into the model, thereby achieving an embodied understanding of the real world.
Related papers
- Uncertainty and Self-Supervision in Single-View Depth [0.8158530638728501]
Single-view depth estimation is an ill-posed problem because there are multiple solutions that explain 3D geometry from a single view.
Deep neural networks have been shown to be effective at capturing depth from a single view, but the majority of current methodologies are deterministic in nature.
We have addressed this problem by quantifying the uncertainty of supervised single-view depth for Bayesian deep neural networks.
arXiv Detail & Related papers (2024-06-20T11:46:17Z) - Motion Degeneracy in Self-supervised Learning of Elevation Angle
Estimation for 2D Forward-Looking Sonar [4.683630397028384]
This study aims to realize stable self-supervised learning of elevation angle estimation without pretraining using synthetic images.
We first analyze the motion field of 2D forward-looking sonar, which is related to the main supervision signal.
arXiv Detail & Related papers (2023-07-30T08:06:11Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z) - Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion [51.19260542887099]
We show that self-supervision can be used to learn accurate depth and ego-motion estimation without prior knowledge of the camera model.
Inspired by the geometric model of Grossberg and Nayar, we introduce Neural Ray Surfaces (NRS), convolutional networks that represent pixel-wise projection rays.
We demonstrate the use of NRS for self-supervised learning of visual odometry and depth estimation from raw videos obtained using a wide variety of camera systems.
arXiv Detail & Related papers (2020-08-15T02:29:13Z) - Depth by Poking: Learning to Estimate Depth from Self-Supervised
Grasping [6.382990675677317]
We train a neural network model to estimate depth from RGB-D images.
Our network predicts, for each pixel in an input image, the z position that a robot's end effector would reach if it attempted to grasp or poke at the corresponding position.
We show our approach achieves significantly lower root mean squared error than traditional structured light sensors.
arXiv Detail & Related papers (2020-06-16T03:34:26Z) - Exploring the Capabilities and Limits of 3D Monocular Object Detection
-- A Study on Simulation and Real World Data [0.0]
3D object detection based on monocular camera data is key enabler for autonomous driving.
Recent deep learning methods show promising results to recover depth information from single images.
In this paper, we evaluate the performance of a 3D object detection pipeline which is parameterizable with different depth estimation configurations.
arXiv Detail & Related papers (2020-05-15T09:05:17Z) - Learning Depth With Very Sparse Supervision [57.911425589947314]
This paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment.
We train a specialized global-local network architecture with what would be available to a robot interacting with the environment.
Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-02T10:44:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.