GenDepth: Generalizing Monocular Depth Estimation for Arbitrary Camera
Parameters via Ground Plane Embedding
- URL: http://arxiv.org/abs/2312.06021v1
- Date: Sun, 10 Dec 2023 22:28:34 GMT
- Title: GenDepth: Generalizing Monocular Depth Estimation for Arbitrary Camera
Parameters via Ground Plane Embedding
- Authors: Karlo Koledi\'c, Luka Petrovi\'c, Ivan Petrovi\'c, Ivan Markovi\'c
- Abstract summary: GenDepth is a novel model capable of performing metric depth estimation for arbitrary vehicle-camera setups.
We propose a novel embedding of camera parameters as the ground plane depth and present a novel architecture that integrates these embeddings with adversarial domain alignment.
We validate GenDepth on several autonomous driving datasets, demonstrating its state-of-the-art generalization capability for different vehicle-camera systems.
- Score: 8.289857214449372
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning-based monocular depth estimation leverages geometric priors present
in the training data to enable metric depth perception from a single image, a
traditionally ill-posed problem. However, these priors are often specific to a
particular domain, leading to limited generalization performance on unseen
data. Apart from the well studied environmental domain gap, monocular depth
estimation is also sensitive to the domain gap induced by varying camera
parameters, an aspect that is often overlooked in current state-of-the-art
approaches. This issue is particularly evident in autonomous driving scenarios,
where datasets are typically collected with a single vehicle-camera setup,
leading to a bias in the training data due to a fixed perspective geometry. In
this paper, we challenge this trend and introduce GenDepth, a novel model
capable of performing metric depth estimation for arbitrary vehicle-camera
setups. To address the lack of data with sufficiently diverse camera
parameters, we first create a bespoke synthetic dataset collected with
different vehicle-camera systems. Then, we design GenDepth to simultaneously
optimize two objectives: (i) equivariance to the camera parameter variations on
synthetic data, (ii) transferring the learned equivariance to real-world
environmental features using a single real-world dataset with a fixed
vehicle-camera system. To achieve this, we propose a novel embedding of camera
parameters as the ground plane depth and present a novel architecture that
integrates these embeddings with adversarial domain alignment. We validate
GenDepth on several autonomous driving datasets, demonstrating its
state-of-the-art generalization capability for different vehicle-camera
systems.
Related papers
- Homography Estimation in Complex Topological Scenes [6.023710971800605]
Surveillance videos and images are used for a broad set of applications, ranging from traffic analysis to crime detection.
Extrinsic camera calibration data is important for most analysis applications.
We present an automated camera-calibration process leveraging a dictionary-based approach that does not require prior knowledge on any camera settings.
arXiv Detail & Related papers (2023-08-02T11:31:43Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Multi-Camera Sensor Fusion for Visual Odometry using Deep Uncertainty
Estimation [34.8860186009308]
We propose a deep sensor fusion framework which estimates vehicle motion using both pose and uncertainty estimations from multiple on-board cameras.
We evaluate our approach on the publicly available, large scale autonomous vehicle dataset, nuScenes.
arXiv Detail & Related papers (2021-12-23T19:44:45Z) - Self-Supervised Camera Self-Calibration from Video [34.35533943247917]
We propose a learning algorithm to regress per-sequence calibration parameters using an efficient family of general camera models.
Our procedure achieves self-calibration results with sub-pixel reprojection error, outperforming other learning-based methods.
arXiv Detail & Related papers (2021-12-06T19:42:05Z) - Camera Calibration through Camera Projection Loss [4.36572039512405]
We propose a novel method to predict intrinsic (focal length and principal point offset) parameters using an image pair.
Unlike existing methods, we proposed a new representation that incorporates camera model equations as a neural network in multi-task learning framework.
Our proposed approach achieves better performance with respect to both deep learning-based and traditional methods on 7 out of 10 parameters evaluated.
arXiv Detail & Related papers (2021-10-07T14:03:10Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - Wide-angle Image Rectification: A Survey [86.36118799330802]
wide-angle images contain distortions that violate the assumptions underlying pinhole camera models.
Image rectification, which aims to correct these distortions, can solve these problems.
We present a detailed description and discussion of the camera models used in different approaches.
Next, we review both traditional geometry-based image rectification methods and deep learning-based methods.
arXiv Detail & Related papers (2020-10-30T17:28:40Z) - Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion [51.19260542887099]
We show that self-supervision can be used to learn accurate depth and ego-motion estimation without prior knowledge of the camera model.
Inspired by the geometric model of Grossberg and Nayar, we introduce Neural Ray Surfaces (NRS), convolutional networks that represent pixel-wise projection rays.
We demonstrate the use of NRS for self-supervised learning of visual odometry and depth estimation from raw videos obtained using a wide variety of camera systems.
arXiv Detail & Related papers (2020-08-15T02:29:13Z) - Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground.
Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights.
We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.