Implicit Learning of Scene Geometry from Poses for Global Localization
- URL: http://arxiv.org/abs/2312.02029v1
- Date: Mon, 4 Dec 2023 16:51:23 GMT
- Title: Implicit Learning of Scene Geometry from Poses for Global Localization
- Authors: Mohammad Altillawi, Shile Li, Sai Manoj Prakhya, Ziyuan Liu, and Joan
Serrat
- Abstract summary: Global visual localization estimates the absolute pose of a camera using a single image, in a previously mapped area.
Many existing approaches directly learn and regress 6 DoF pose from an input image.
We propose to utilize these minimal available labels to learn the underlying 3D geometry of the scene.
- Score: 7.077874294016776
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Global visual localization estimates the absolute pose of a camera using a
single image, in a previously mapped area. Obtaining the pose from a single
image enables many robotics and augmented/virtual reality applications.
Inspired by latest advances in deep learning, many existing approaches directly
learn and regress 6 DoF pose from an input image. However, these methods do not
fully utilize the underlying scene geometry for pose regression. The challenge
in monocular relocalization is the minimal availability of supervised training
data, which is just the corresponding 6 DoF poses of the images. In this paper,
we propose to utilize these minimal available labels (.i.e, poses) to learn the
underlying 3D geometry of the scene and use the geometry to estimate the 6 DoF
camera pose. We present a learning method that uses these pose labels and rigid
alignment to learn two 3D geometric representations (\textit{X, Y, Z
coordinates}) of the scene, one in camera coordinate frame and the other in
global coordinate frame. Given a single image, it estimates these two 3D scene
representations, which are then aligned to estimate a pose that matches the
pose label. This formulation allows for the active inclusion of additional
learning constraints to minimize 3D alignment errors between the two 3D scene
representations, and 2D re-projection errors between the 3D global scene
representation and 2D image pixels, resulting in improved localization
accuracy. During inference, our model estimates the 3D scene geometry in camera
and global frames and aligns them rigidly to obtain pose in real-time. We
evaluate our work on three common visual localization datasets, conduct
ablation studies, and show that our method exceeds state-of-the-art regression
methods' pose accuracy on all datasets.
Related papers
- Combining Absolute and Semi-Generalized Relative Poses for Visual Localization [39.2464667533733]
State-of-the-art localization approaches use 2D-3D matches between pixels in a query image and 3D points in the scene for pose estimation.
In contrast, structure-less methods rely on 2D-2D matches and do not require any 3D scene model.
We show that combining both strategies improves localization performance in multiple practically relevant scenarios.
arXiv Detail & Related papers (2024-09-21T23:55:42Z) - FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses
via Pixel-Aligned Scene Flow [26.528667940013598]
Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning.
Key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion.
We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass.
arXiv Detail & Related papers (2023-05-31T20:58:46Z) - SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications.
We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial.
We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z) - Visual Localization using Imperfect 3D Models from the Internet [54.731309449883284]
This paper studies how imperfections in 3D models affect localization accuracy.
We show that 3D models from the Internet show promise as an easy-to-obtain scene representation.
arXiv Detail & Related papers (2023-04-12T16:15:05Z) - VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual
Data [69.64723752430244]
We introduce VirtualPose, a two-stage learning framework to exploit the hidden "free lunch" specific to this task.
The first stage transforms images to abstract geometry representations (AGR), and then the second maps them to 3D poses.
It addresses the generalization issue from two aspects: (1) the first stage can be trained on diverse 2D datasets to reduce the risk of over-fitting to limited appearance; (2) the second stage can be trained on diverse AGR synthesized from a large number of virtual cameras and poses.
arXiv Detail & Related papers (2022-07-20T14:47:28Z) - Sparse Pose Trajectory Completion [87.31270669154452]
We propose a method to learn, even using a dataset where objects appear only in sparsely sampled views.
This is achieved with a cross-modal pose trajectory transfer mechanism.
Our method is evaluated on the Pix3D and ShapeNet datasets.
arXiv Detail & Related papers (2021-05-01T00:07:21Z) - Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model.
The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z) - SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation [46.85865451812981]
We propose a novel system that first regresses a set of 2.5D representations of body parts and then reconstructs the 3D absolute poses based on these 2.5D representations with a depth-aware part association algorithm.
Such a single-shot bottom-up scheme allows the system to better learn and reason about the inter-person depth relationship, improving both 3D and 2D pose estimation.
arXiv Detail & Related papers (2020-08-26T09:56:07Z) - Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A
Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs.
It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space.
The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.