Related papers: ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation

ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation

URL: http://arxiv.org/abs/2408.09042v1
Date: Fri, 16 Aug 2024 22:45:46 GMT
Title: ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation
Authors: Hao Tang, Weiyao Wang, Pierre Gleize, Matt Feiszli,
Abstract summary: Recovering camera poses from a set of images is a foundational task in 3D computer vision. Recent data-driven approaches aim to directly output camera poses, either through regressing the 6DoF camera poses or formulating rotation as a probability distribution. We propose ADen to unify the two frameworks by employing a generator and a discriminator.
Score: 17.097170273209333
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recovering camera poses from a set of images is a foundational task in 3D computer vision, which powers key applications such as 3D scene/object reconstructions. Classic methods often depend on feature correspondence, such as keypoints, which require the input images to have large overlap and small viewpoint changes. Such requirements present considerable challenges in scenarios with sparse views. Recent data-driven approaches aim to directly output camera poses, either through regressing the 6DoF camera poses or formulating rotation as a probability distribution. However, each approach has its limitations. On one hand, directly regressing the camera poses can be ill-posed, since it assumes a single mode, which is not true under symmetry and leads to sub-optimal solutions. On the other hand, probabilistic approaches are capable of modeling the symmetry ambiguity, yet they sample the entire space of rotation uniformly by brute-force. This leads to an inevitable trade-off between high sample density, which improves model precision, and sample efficiency that determines the runtime. In this paper, we propose ADen to unify the two frameworks by employing a generator and a discriminator: the generator is trained to output multiple hypotheses of 6DoF camera pose to represent a distribution and handle multi-mode ambiguity, and the discriminator is trained to identify the hypothesis that best explains the data. This allows ADen to combine the best of both worlds, achieving substantially higher precision as well as lower runtime than previous methods in empirical evaluations.

Related papers

Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion [9.025235713063509]
We tackle the harder problem of pose estimation for category-level objects from a single RGB image. We propose a novel solution that eliminates the need for specific object models or depth information. Our approach outperforms the current state-of-the-art on the REAL275 dataset by a significant margin.
arXiv Detail & Related papers (2024-12-16T03:39:33Z)
DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses. We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass. Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z)
Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views. We propose a distributed representation of camera pose that treats a camera as a bundle of rays. Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z)
iComMa: Inverting 3D Gaussian Splatting for Camera Pose Estimation via Comparing and Matching [14.737266480464156]
We present a method named iComMa to address the 6D camera pose estimation problem in computer vision. We propose an efficient method for accurate camera pose estimation by inverting 3D Gaussian Splatting (3DGS)
arXiv Detail & Related papers (2023-12-14T15:31:33Z)
Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation [22.127170452402332]
This paper presents a novel Probabilistic Triangulation module that can be embedded in a calibrated 3D human pose estimation method. Our method achieves a trade-off between estimation accuracy and generalizability.
arXiv Detail & Related papers (2023-09-09T11:03:37Z)
Learning to Estimate 6DoF Pose from Limited Data: A Few-Shot, Generalizable Approach using RGB Images [60.0898989456276]
We present a new framework named Cas6D for few-shot 6DoF pose estimation that is generalizable and uses only RGB images. To address the false positives of target object detection in the extreme few-shot setting, our framework utilizes a self-supervised pre-trained ViT to learn robust feature representations. Experimental results on the LINEMOD and GenMOP datasets demonstrate that Cas6D outperforms state-of-the-art methods by 9.2% and 3.8% accuracy (Proj-5) under the 32-shot setting.
arXiv Detail & Related papers (2023-06-13T07:45:42Z)
PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator. We create a new training pipeline for object to image matching based on a three-view system. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z)
A Probabilistic Framework for Visual Localization in Ambiguous Scenes [64.13544430239267]
We propose a probabilistic framework that for a given image predicts the arbitrarily shaped posterior distribution of its camera pose. We do this via a novel formulation of camera pose regression using variational inference, which allows sampling from the predicted distribution. Our method outperforms existing methods on localization in ambiguous scenes.
arXiv Detail & Related papers (2023-01-05T14:46:54Z)
DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion models [5.908471365011943]
We propose emphDiffPose, a conditional diffusion model that predicts multiple hypotheses for a given input image. We show that DiffPose slightly improves upon the state of the art for multi-hypothesis pose estimation for simple poses and outperforms it by a large margin for highly ambiguous poses.
arXiv Detail & Related papers (2022-11-29T18:55:13Z)
Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation [74.76155168705975]
Deep Bingham Networks (DBN) can handle pose-related uncertainties and ambiguities arising in almost all real life applications concerning 3D data. DBN extends the state of the art direct pose regression networks by (i) a multi-hypotheses prediction head which can yield different distribution modes. We propose new training strategies so as to avoid mode or posterior collapse during training and to improve numerical stability.
arXiv Detail & Related papers (2020-12-20T19:20:26Z)
6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference [67.70859730448473]
We present a multimodal camera relocalization framework that captures ambiguities and uncertainties. We predict multiple camera pose hypotheses as well as the respective uncertainty for each prediction. We introduce a new dataset specifically designed to foster camera localization research in ambiguous environments.
arXiv Detail & Related papers (2020-04-09T20:55:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.