Related papers: Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

URL: http://arxiv.org/abs/2103.09213v1
Date: Tue, 16 Mar 2021 17:40:12 GMT
Title: Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
Authors: Paul-Edouard Sarlin, Ajaykumar Unagar, M{\aa}ns Larsson, Hugo Germain, Carl Toft, Viktor Larsson, Marc Pollefeys, Vincent Lepetit, Lars Hammarstrand, Fredrik Kahl, Torsten Sattler
Abstract summary: We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model. The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
Score: 114.89389528198738
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Camera pose estimation in known scenes is a 3D geometry task recently tackled by multiple learning algorithms. Many regress precise geometric quantities, like poses or 3D points, from an input image. This either fails to generalize to new viewpoints or ties the model parameters to a specific scene. In this paper, we go Back to the Feature: we argue that deep networks should focus on learning robust and invariant visual features, while the geometric estimation should be left to principled algorithms. We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model. Our approach is based on the direct alignment of multiscale deep features, casting camera localization as metric learning. PixLoc learns strong data priors by end-to-end training from pixels to pose and exhibits exceptional generalization to new scenes by separating model parameters and scene geometry. The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching by jointly refining keypoints and poses with little overhead. The code will be publicly available at https://github.com/cvg/pixloc.

Related papers

FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views [93.6881532277553]
We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images. Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
arXiv Detail & Related papers (2025-02-17T18:54:05Z)
GeoCalib: Learning Single-image Calibration with Geometric Optimization [89.84142934465685]
From a single image, visual cues can help deduce intrinsic and extrinsic camera parameters like the focal length and the gravity direction. Current approaches to this problem are based on either classical geometry with lines and vanishing points or on deep neural networks trained end-to-end. We introduce GeoCalib, a deep neural network that leverages universal rules of 3D geometry through an optimization process.
arXiv Detail & Related papers (2024-09-10T17:59:55Z)
Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics. We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z)
LEAP: Liberate Sparse-view 3D Modeling from Camera Poses [28.571234973474077]
We present LEAP, a pose-free approach for sparse-view 3D modeling. LEAP discards pose-based operations and learns geometric knowledge from data. We show LEAP significantly outperforms prior methods when they employ predicted poses from state-of-the-art pose estimators.
arXiv Detail & Related papers (2023-10-02T17:59:37Z)
FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction. Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z)
Object-Based Visual Camera Pose Estimation From Ellipsoidal Model and 3D-Aware Ellipse Prediction [2.016317500787292]
We propose a method for initial camera pose estimation from just a single image. It exploits the ability of deep learning techniques to reliably detect objects regardless of viewing conditions. Experiments prove that the accuracy of the computed pose significantly increases thanks to our method.
arXiv Detail & Related papers (2022-03-09T10:00:52Z)
Pixel-Perfect Structure-from-Motion with Featuremetric Refinement [96.73365545609191]
We refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale.
arXiv Detail & Related papers (2021-08-18T17:58:55Z)
Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space. Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z)
Shape and Viewpoint without Keypoints [63.26977130704171]
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image. We trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision. We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects.
arXiv Detail & Related papers (2020-07-21T17:58:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.