Cameras as Rays: Pose Estimation via Ray Diffusion
- URL: http://arxiv.org/abs/2402.14817v3
- Date: Thu, 4 Apr 2024 16:27:06 GMT
- Title: Cameras as Rays: Pose Estimation via Ray Diffusion
- Authors: Jason Y. Zhang, Amy Lin, Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, Shubham Tulsiani,
- Abstract summary: Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views.
We propose a distributed representation of camera pose that treats a camera as a bundle of rays.
Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
- Score: 54.098613859015856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views (<10). In contrast to existing approaches that pursue top-down prediction of global parametrizations of camera extrinsics, we propose a distributed representation of camera pose that treats a camera as a bundle of rays. This representation allows for a tight coupling with spatial image features improving pose precision. We observe that this representation is naturally suited for set-level transformers and develop a regression-based approach that maps image patches to corresponding rays. To capture the inherent uncertainties in sparse-view pose inference, we adapt this approach to learn a denoising diffusion model which allows us to sample plausible modes while improving performance. Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D while generalizing to unseen object categories and in-the-wild captures.
Related papers
- SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers [57.46911575980854]
We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation.
Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions.
Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations.
arXiv Detail & Related papers (2024-04-19T04:51:18Z) - ReNoise: Real Image Inversion Through Iterative Noising [62.96073631599749]
We introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations.
We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models.
arXiv Detail & Related papers (2024-03-21T17:52:08Z) - iComMa: Inverting 3D Gaussian Splatting for Camera Pose Estimation via Comparing and Matching [14.737266480464156]
We present a method named iComMa to address the 6D camera pose estimation problem in computer vision.
We propose an efficient method for accurate camera pose estimation by inverting 3D Gaussian Splatting (3DGS)
arXiv Detail & Related papers (2023-12-14T15:31:33Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust
Correspondence Field Estimation and Pose Optimization [46.144194562841435]
We propose a framework based on a recurrent neural network (RNN) for object pose refinement.
The problem is formulated as a non-linear least squares problem based on the estimated correspondence field.
The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover accurate object poses.
arXiv Detail & Related papers (2022-03-24T06:24:55Z) - Reassessing the Limitations of CNN Methods for Camera Pose Regression [27.86655424544118]
We propose a model that can directly regress the camera pose from images with significantly higher accuracy than existing methods of the same class.
We first analyse why regression methods are still behind the state-of-the-art, and we bridge the performance gap with our new approach.
arXiv Detail & Related papers (2021-08-16T17:55:26Z) - Wide-Baseline Relative Camera Pose Estimation with Directional Learning [46.21836501895394]
We introduce DirectionNet, which estimates discrete distributions over the 5D relative pose space using a novel parameterization to make the estimation problem tractable.
We evaluate our model on challenging synthetic and real pose estimation datasets constructed from Matterport3D and InteriorNet.
arXiv Detail & Related papers (2021-06-07T04:46:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.