FoVA-Depth: Field-of-View Agnostic Depth Estimation for Cross-Dataset
Generalization
- URL: http://arxiv.org/abs/2401.13786v1
- Date: Wed, 24 Jan 2024 20:07:59 GMT
- Title: FoVA-Depth: Field-of-View Agnostic Depth Estimation for Cross-Dataset
Generalization
- Authors: Daniel Lichy, Hang Su, Abhishek Badki, Jan Kautz, Orazio Gallo
- Abstract summary: We propose a method to train a stereo depth estimation model on the widely available pinhole data.
We show strong generalization ability of our approach on both indoor and outdoor datasets.
- Score: 57.98448472585241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Wide field-of-view (FoV) cameras efficiently capture large portions of the
scene, which makes them attractive in multiple domains, such as automotive and
robotics. For such applications, estimating depth from multiple images is a
critical task, and therefore, a large amount of ground truth (GT) data is
available. Unfortunately, most of the GT data is for pinhole cameras, making it
impossible to properly train depth estimation models for large-FoV cameras. We
propose the first method to train a stereo depth estimation model on the widely
available pinhole data, and to generalize it to data captured with larger FoVs.
Our intuition is simple: We warp the training data to a canonical, large-FoV
representation and augment it to allow a single network to reason about diverse
types of distortions that otherwise would prevent generalization. We show
strong generalization ability of our approach on both indoor and outdoor
datasets, which was not possible with previous methods.
Related papers
- GRAPE: Generalizable and Robust Multi-view Facial Capture [12.255610707737548]
Deep learning-based multi-view facial capture methods have shown impressive accuracy while being several orders of magnitude faster than a traditional mesh registration pipeline.
In this study, we aim to improve the generalization ability so that a trained model can be readily used for inference (i.e. capture new data) on a different camera array.
Experiments on the FaMoS and FaceScape datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-07-14T13:24:17Z) - OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation [56.028185293563325]
This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation.
We first introduce OO3D-9D, a large-scale photorealistic dataset for this task.
We then propose a framework built on pre-trained DinoV2 and text-to-image stable diffusion models.
arXiv Detail & Related papers (2024-03-19T03:09:24Z) - Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data [87.61900472933523]
This work presents Depth Anything, a highly practical solution for robust monocular depth estimation.
We scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data.
We evaluate its zero-shot capabilities extensively, including six public datasets and randomly captured photos.
arXiv Detail & Related papers (2024-01-19T18:59:52Z) - Towards Domain Generalization for Multi-view 3D Object Detection in
Bird-Eye-View [11.958753088613637]
We first analyze the causes of the domain gap for the MV3D-Det task.
To acquire a robust depth prediction, we propose to decouple the depth estimation from intrinsic parameters of the camera.
We modify the focal length values to create multiple pseudo-domains and construct an adversarial training loss to encourage the feature representation to be more domain-agnostic.
arXiv Detail & Related papers (2023-03-03T02:59:13Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Weakly Supervised Training of Monocular 3D Object Detectors Using Wide
Baseline Multi-view Traffic Camera Data [19.63193201107591]
7DoF prediction of vehicles at an intersection is an important task for assessing potential conflicts between road users.
We develop an approach using a weakly supervised method of fine tuning 3D object detectors for traffic observation cameras.
Our method achieves vehicle 7DoF pose prediction accuracy on our dataset comparable to the top performing monocular 3D object detectors on autonomous vehicle datasets.
arXiv Detail & Related papers (2021-10-21T08:26:48Z) - SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround
View Fisheye Cameras [30.480562747903186]
A 360deg perception of scene geometry is essential for automated driving, notably for parking and urban driving scenarios.
We present novel camera-geometry adaptive multi-scale convolutions which utilize the camera parameters as a conditional input.
We evaluate our approach on the Fisheye WoodScape surround-view dataset, significantly improving over previous approaches.
arXiv Detail & Related papers (2021-04-09T15:20:20Z) - Wide-angle Image Rectification: A Survey [86.36118799330802]
wide-angle images contain distortions that violate the assumptions underlying pinhole camera models.
Image rectification, which aims to correct these distortions, can solve these problems.
We present a detailed description and discussion of the camera models used in different approaches.
Next, we review both traditional geometry-based image rectification methods and deep learning-based methods.
arXiv Detail & Related papers (2020-10-30T17:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.