Camera Pose Matters: Improving Depth Prediction by Mitigating Pose
Distribution Bias
- URL: http://arxiv.org/abs/2007.03887v2
- Date: Sun, 28 Mar 2021 05:51:23 GMT
- Title: Camera Pose Matters: Improving Depth Prediction by Mitigating Pose
Distribution Bias
- Authors: Yunhan Zhao, Shu Kong, Charless Fowlkes
- Abstract summary: We propose two novel techniques that exploit the camera pose during training and prediction.
First, we introduce a simple perspective-aware data augmentation that synthesizes new training examples with more diverse views.
Second, we propose a conditional model that exploits the per-image camera pose as prior knowledge by encoding it as a part of the input.
- Score: 12.354076490479516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth predictors are typically trained on large-scale training sets
which are naturally biased w.r.t the distribution of camera poses. As a result,
trained predictors fail to make reliable depth predictions for testing examples
captured under uncommon camera poses. To address this issue, we propose two
novel techniques that exploit the camera pose during training and prediction.
First, we introduce a simple perspective-aware data augmentation that
synthesizes new training examples with more diverse views by perturbing the
existing ones in a geometrically consistent manner. Second, we propose a
conditional model that exploits the per-image camera pose as prior knowledge by
encoding it as a part of the input. We show that jointly applying the two
methods improves depth prediction on images captured under uncommon and even
never-before-seen camera poses. We show that our methods improve performance
when applied to a range of different predictor architectures. Lastly, we show
that explicitly encoding the camera pose distribution improves the
generalization performance of a synthetically trained depth predictor when
evaluated on real images.
Related papers
- Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation [47.68705641608316]
We propose a novel framework for estimating the relative pose between two cameras from point correspondences with associated monocular depths.
We derive efficient solvers for three cases: (1) two calibrated cameras, (2) two uncalibrated cameras with an unknown but shared focal length, and (3) two uncalibrated cameras with unknown and different focal lengths.
Compared to prior work, our solvers achieve state-of-the-art results on two large-scale, real-world datasets.
arXiv Detail & Related papers (2025-01-13T23:13:33Z) - ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation [17.097170273209333]
Recovering camera poses from a set of images is a foundational task in 3D computer vision.
Recent data-driven approaches aim to directly output camera poses, either through regressing the 6DoF camera poses or formulating rotation as a probability distribution.
We propose ADen to unify the two frameworks by employing a generator and a discriminator.
arXiv Detail & Related papers (2024-08-16T22:45:46Z) - Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views.
We propose a distributed representation of camera pose that treats a camera as a bundle of rays.
Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z) - A Probabilistic Framework for Visual Localization in Ambiguous Scenes [64.13544430239267]
We propose a probabilistic framework that for a given image predicts the arbitrarily shaped posterior distribution of its camera pose.
We do this via a novel formulation of camera pose regression using variational inference, which allows sampling from the predicted distribution.
Our method outperforms existing methods on localization in ambiguous scenes.
arXiv Detail & Related papers (2023-01-05T14:46:54Z) - Few-View Object Reconstruction with Unknown Categories and Camera Poses [80.0820650171476]
This work explores reconstructing general real-world objects from a few images without known camera poses or object categories.
The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation.
Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence.
arXiv Detail & Related papers (2022-12-08T18:59:02Z) - Reassessing the Limitations of CNN Methods for Camera Pose Regression [27.86655424544118]
We propose a model that can directly regress the camera pose from images with significantly higher accuracy than existing methods of the same class.
We first analyse why regression methods are still behind the state-of-the-art, and we bridge the performance gap with our new approach.
arXiv Detail & Related papers (2021-08-16T17:55:26Z) - Towards Accurate Human Pose Estimation in Videos of Crowded Scenes [134.60638597115872]
We focus on improving human pose estimation in videos of crowded scenes from the perspectives of exploiting temporal context and collecting new data.
For one frame, we forward the historical poses from the previous frames and backward the future poses from the subsequent frames to current frame, leading to stable and accurate human pose estimation in videos.
In this way, our model achieves best performance on 7 out of 13 videos and 56.33 average w_AP on test dataset of HIE challenge.
arXiv Detail & Related papers (2020-10-16T13:19:11Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z) - Unsupervised Learning of Camera Pose with Compositional Re-estimation [10.251550038802343]
Given an input video sequence, our goal is to estimate the camera pose (i.e. the camera motion) between consecutive frames.
We propose an alternative approach that utilizes a compositional re-estimation process for camera pose estimation.
Our approach significantly improves the predicted camera motion both quantitatively and visually.
arXiv Detail & Related papers (2020-01-17T18:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.