Reference Pose Generation for Long-term Visual Localization via Learned
Features and View Synthesis
- URL: http://arxiv.org/abs/2005.05179v4
- Date: Wed, 30 Dec 2020 14:29:28 GMT
- Title: Reference Pose Generation for Long-term Visual Localization via Learned
Features and View Synthesis
- Authors: Zichao Zhang, Torsten Sattler, Davide Scaramuzza
- Abstract summary: We propose a semi-automated approach to generate reference poses based on feature matching between renderings of a 3D model and real images via learned features.
We significantly improve the nighttime reference poses of the popular Aachen Day-Night dataset, showing that state-of-the-art visual localization methods perform better (up to $47%$) than predicted by the original reference poses.
- Score: 88.80710311624101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Localization is one of the key enabling technologies for autonomous
driving and augmented reality. High quality datasets with accurate 6
Degree-of-Freedom (DoF) reference poses are the foundation for benchmarking and
improving existing methods. Traditionally, reference poses have been obtained
via Structure-from-Motion (SfM). However, SfM itself relies on local features
which are prone to fail when images were taken under different conditions,
e.g., day/ night changes. At the same time, manually annotating feature
correspondences is not scalable and potentially inaccurate. In this work, we
propose a semi-automated approach to generate reference poses based on feature
matching between renderings of a 3D model and real images via learned features.
Given an initial pose estimate, our approach iteratively refines the pose based
on feature matches against a rendering of the model from the current pose
estimate. We significantly improve the nighttime reference poses of the popular
Aachen Day-Night dataset, showing that state-of-the-art visual localization
methods perform better (up to $47\%$) than predicted by the original reference
poses. We extend the dataset with new nighttime test images, provide
uncertainty estimates for our new reference poses, and introduce a new
evaluation criterion. We will make our reference poses and our framework
publicly available upon publication.
Related papers
- Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - FoundPose: Unseen Object Pose Estimation with Foundation Features [11.32559845631345]
FoundPose is a model-based method for 6D pose estimation of unseen objects from a single RGB image.
The method can quickly onboard new objects using their 3D models without requiring any object- or task-specific training.
arXiv Detail & Related papers (2023-11-30T18:52:29Z) - PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
Prediction [77.89935657608926]
We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images.
PF-LRM simultaneously estimates the relative camera poses in 1.3 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-11-20T18:57:55Z) - Denoising Diffusion for 3D Hand Pose Estimation from Images [38.20064386142944]
This paper addresses the problem of 3D hand pose estimation from monocular images or sequences.
We present a novel end-to-end framework for 3D hand regression that employs diffusion models that have shown excellent ability to capture the distribution of data for generative purposes.
The proposed model provides state-of-the-art performance when lifting a 2D single-hand image to 3D.
arXiv Detail & Related papers (2023-08-18T12:57:22Z) - DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose
Estimation [16.32910684198013]
We present DiffPose, a novel diffusion architecture that formulates video-based human pose estimation as a conditional heatmap generation problem.
We show two unique characteristics from DiffPose on pose estimation task: (i) the ability to combine multiple sets of pose estimates to improve prediction accuracy, particularly for challenging joints, and (ii) the ability to adjust the number of iterative steps for feature refinement without retraining the model.
arXiv Detail & Related papers (2023-07-31T14:00:23Z) - TempCLR: Reconstructing Hands via Time-Coherent Contrastive Learning [30.823358555054856]
We introduce TempCLR, a new time-coherent contrastive learning approach for the structured regression task of 3D hand reconstruction.
Our framework considers temporal consistency in its augmentation scheme, and accounts for the differences of hand poses along the temporal direction.
Our approach improves the performance of fully-supervised hand reconstruction methods by 15.9% and 7.6% in PA-V2V on the HO-3D and FreiHAND datasets respectively.
arXiv Detail & Related papers (2022-09-01T14:19:05Z) - What's in your hands? 3D Reconstruction of Generic Objects in Hands [49.12461675219253]
Our work aims to reconstruct hand-held objects given a single RGB image.
In contrast to prior works that typically assume known 3D templates and reduce the problem to 3D pose estimation, our work reconstructs generic hand-held object without knowing their 3D templates.
arXiv Detail & Related papers (2022-04-14T17:59:02Z) - Novel Object Viewpoint Estimation through Reconstruction Alignment [45.16865218423492]
We learn a reconstruct and align approach to estimate the viewpoint of a novel object.
In particular, we propose learning two networks: the first maps images to a 3D geometry-aware feature bottleneck and is trained via an image-to-image translation loss.
At test time, our model finds the relative transformation that best aligns the bottleneck features of our test image to a reference image.
arXiv Detail & Related papers (2020-06-05T17:58:14Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.