Reassessing the Limitations of CNN Methods for Camera Pose Regression
- URL: http://arxiv.org/abs/2108.07260v1
- Date: Mon, 16 Aug 2021 17:55:26 GMT
- Title: Reassessing the Limitations of CNN Methods for Camera Pose Regression
- Authors: Tony Ng, Adrian Lopez-Rodriguez, Vassileios Balntas, Krystian
Mikolajczyk
- Abstract summary: We propose a model that can directly regress the camera pose from images with significantly higher accuracy than existing methods of the same class.
We first analyse why regression methods are still behind the state-of-the-art, and we bridge the performance gap with our new approach.
- Score: 27.86655424544118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the problem of camera pose estimation in outdoor
and indoor scenarios. In comparison to the currently top-performing methods
that rely on 2D to 3D matching, we propose a model that can directly regress
the camera pose from images with significantly higher accuracy than existing
methods of the same class. We first analyse why regression methods are still
behind the state-of-the-art, and we bridge the performance gap with our new
approach. Specifically, we propose a way to overcome the biased training data
by a novel training technique, which generates poses guided by a probability
distribution from the training set for synthesising new training views. Lastly,
we evaluate our approach on two widely used benchmarks and show that it
achieves significantly improved performance compared to prior regression-based
methods, retrieval techniques as well as 3D pipelines with local feature
matching.
Related papers
- FaVoR: Features via Voxel Rendering for Camera Relocalization [23.7893950095252]
Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image.
We propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features.
By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking.
arXiv Detail & Related papers (2024-09-11T18:58:16Z) - Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views.
We propose a distributed representation of camera pose that treats a camera as a bundle of rays.
Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - DFNet: Enhance Absolute Pose Regression with Direct Feature Matching [16.96571417692014]
We introduce a camera relocalization pipeline that combines absolute pose regression (APR) and direct feature matching.
We show that our method achieves a state-of-the-art accuracy by outperforming existing single-image APR methods by as much as 56%, comparable to 3D structure-based methods.
arXiv Detail & Related papers (2022-04-01T16:39:16Z) - Distribution-Aware Single-Stage Models for Multi-Person 3D Pose
Estimation [29.430404703883084]
We present a novel Distribution-Aware Single-stage (DAS) model for tackling the challenging multi-person 3D pose estimation problem.
The proposed DAS model simultaneously localizes person positions and their corresponding body joints in the 3D camera space in a one-pass manner.
Comprehensive experiments on benchmarks CMU Panoptic and MuPoTS-3D demonstrate the superior efficiency of the proposed DAS model.
arXiv Detail & Related papers (2022-03-15T07:30:27Z) - Poseur: Direct Human Pose Regression with Transformers [119.79232258661995]
We propose a direct, regression-based approach to 2D human pose estimation from single images.
Our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints.
Ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.
arXiv Detail & Related papers (2022-01-19T04:31:57Z) - Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations.
We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model.
Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z) - Learning Eye-in-Hand Camera Calibration from a Single Image [7.262048441360133]
Eye-in-hand camera calibration is a fundamental and long-studied problem in robotics.
We present a study on using learning-based methods for solving this problem online from a single RGB image.
arXiv Detail & Related papers (2021-11-01T20:17:31Z) - Pose Guided Person Image Generation with Hidden p-Norm Regression [113.41144529452663]
We propose a novel approach to solve the pose guided person image generation task.
Our method estimates a pose-invariant feature matrix for each identity, and uses it to predict the target appearance conditioned on the target pose.
Our method yields competitive performance in all the aforementioned variant scenarios.
arXiv Detail & Related papers (2021-02-19T17:03:54Z) - Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image.
We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end.
Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.