RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild
- URL: http://arxiv.org/abs/2208.05963v1
- Date: Thu, 11 Aug 2022 17:59:59 GMT
- Title: RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild
- Authors: Jason Y. Zhang and Deva Ramanan and Shubham Tulsiani
- Abstract summary: We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
- Score: 73.1276968007689
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We describe a data-driven method for inferring the camera viewpoints given
multiple images of an arbitrary object. This task is a core component of
classic geometric pipelines such as SfM and SLAM, and also serves as a vital
pre-processing requirement for contemporary neural approaches (e.g. NeRF) to
object reconstruction and view synthesis. In contrast to existing
correspondence-driven methods that do not perform well given sparse views, we
propose a top-down prediction based approach for estimating camera viewpoints.
Our key technical insight is the use of an energy-based formulation for
representing distributions over relative camera rotations, thus allowing us to
explicitly represent multiple camera modes arising from object symmetries or
views. Leveraging these relative predictions, we jointly estimate a consistent
set of camera rotations from multiple images. We show that our approach
outperforms state-of-the-art SfM and SLAM methods given sparse images on both
seen and unseen categories. Further, our probabilistic approach significantly
outperforms directly regressing relative poses, suggesting that modeling
multimodality is important for coherent joint reconstruction. We demonstrate
that our system can be a stepping stone toward in-the-wild reconstruction from
multi-view datasets. The project page with code and videos can be found at
https://jasonyzhang.com/relpose.
Related papers
- Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views.
We propose a distributed representation of camera pose that treats a camera as a bundle of rays.
Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation [23.615122326731115]
We propose a novel solution that makes use of RGB video streams.
Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph.
Our experimental results demonstrate that when utilizing public dataset sequences with high-quality depth information, the proposed method exhibits comparable performance to state-of-the-art RGB-D methods.
arXiv Detail & Related papers (2023-08-17T08:29:54Z) - PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle
Adjustment [21.98302129015761]
We propose to formulate the Structure from Motion (SfM) problem inside a probabilistic diffusion framework.
We show that our method PoseDiffusion significantly improves over the classic SfM pipelines.
It is observed that our method can generalize across datasets without further training.
arXiv Detail & Related papers (2023-06-27T17:59:07Z) - Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z) - Deep Keypoint-Based Camera Pose Estimation with Geometric Constraints [80.60538408386016]
Estimating relative camera poses from consecutive frames is a fundamental problem in visual odometry.
We propose an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection.
arXiv Detail & Related papers (2020-07-29T21:41:31Z) - Object-Centric Multi-View Aggregation [86.94544275235454]
We present an approach for aggregating a sparse set of views of an object in order to compute a semi-implicit 3D representation in the form of a volumetric feature grid.
Key to our approach is an object-centric canonical 3D coordinate system into which views can be lifted, without explicit camera pose estimation.
We show that computing a symmetry-aware mapping from pixels to the canonical coordinate system allows us to better propagate information to unseen regions.
arXiv Detail & Related papers (2020-07-20T17:38:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.