Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose
- URL: http://arxiv.org/abs/2103.09213v1
- Date: Tue, 16 Mar 2021 17:40:12 GMT
- Title: Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose
- Authors: Paul-Edouard Sarlin, Ajaykumar Unagar, M{\aa}ns Larsson, Hugo Germain,
Carl Toft, Viktor Larsson, Marc Pollefeys, Vincent Lepetit, Lars
Hammarstrand, Fredrik Kahl, Torsten Sattler
- Abstract summary: We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model.
The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
- Score: 114.89389528198738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera pose estimation in known scenes is a 3D geometry task recently tackled
by multiple learning algorithms. Many regress precise geometric quantities,
like poses or 3D points, from an input image. This either fails to generalize
to new viewpoints or ties the model parameters to a specific scene. In this
paper, we go Back to the Feature: we argue that deep networks should focus on
learning robust and invariant visual features, while the geometric estimation
should be left to principled algorithms. We introduce PixLoc, a scene-agnostic
neural network that estimates an accurate 6-DoF pose from an image and a 3D
model. Our approach is based on the direct alignment of multiscale deep
features, casting camera localization as metric learning. PixLoc learns strong
data priors by end-to-end training from pixels to pose and exhibits exceptional
generalization to new scenes by separating model parameters and scene geometry.
The system can localize in large environments given coarse pose priors but also
improve the accuracy of sparse feature matching by jointly refining keypoints
and poses with little overhead. The code will be publicly available at
https://github.com/cvg/pixloc.
Related papers
- GeoCalib: Learning Single-image Calibration with Geometric Optimization [89.84142934465685]
From a single image, visual cues can help deduce intrinsic and extrinsic camera parameters like the focal length and the gravity direction.
Current approaches to this problem are based on either classical geometry with lines and vanishing points or on deep neural networks trained end-to-end.
We introduce GeoCalib, a deep neural network that leverages universal rules of 3D geometry through an optimization process.
arXiv Detail & Related papers (2024-09-10T17:59:55Z) - Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics.
We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z) - LEAP: Liberate Sparse-view 3D Modeling from Camera Poses [28.571234973474077]
We present LEAP, a pose-free approach for sparse-view 3D modeling.
LEAP discards pose-based operations and learns geometric knowledge from data.
We show LEAP significantly outperforms prior methods when they employ predicted poses from state-of-the-art pose estimators.
arXiv Detail & Related papers (2023-10-02T17:59:37Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Object-Based Visual Camera Pose Estimation From Ellipsoidal Model and
3D-Aware Ellipse Prediction [2.016317500787292]
We propose a method for initial camera pose estimation from just a single image.
It exploits the ability of deep learning techniques to reliably detect objects regardless of viewing conditions.
Experiments prove that the accuracy of the computed pose significantly increases thanks to our method.
arXiv Detail & Related papers (2022-03-09T10:00:52Z) - Pixel-Perfect Structure-from-Motion with Featuremetric Refinement [96.73365545609191]
We refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views.
This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors.
Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale.
arXiv Detail & Related papers (2021-08-18T17:58:55Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - Shape and Viewpoint without Keypoints [63.26977130704171]
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image.
We trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.
We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects.
arXiv Detail & Related papers (2020-07-21T17:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.