FocusTune: Tuning Visual Localization through Focus-Guided Sampling
- URL: http://arxiv.org/abs/2311.02872v1
- Date: Mon, 6 Nov 2023 04:58:47 GMT
- Title: FocusTune: Tuning Visual Localization through Focus-Guided Sampling
- Authors: Son Tung Nguyen, Alejandro Fontan, Michael Milford, Tobias Fischer
- Abstract summary: FocusTune is a focus-guided sampling technique to improve the performance of visual localization algorithms.
We demonstrate that FocusTune both improves or matches state-of-the-art performance whilst keeping ACE's appealing low storage and compute requirements.
This combination of high performance and low compute and storage requirements is particularly promising for applications in areas like mobile robotics and augmented reality.
- Score: 61.79440120153917
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose FocusTune, a focus-guided sampling technique to improve the
performance of visual localization algorithms. FocusTune directs a scene
coordinate regression model towards regions critical for 3D point triangulation
by exploiting key geometric constraints. Specifically, rather than uniformly
sampling points across the image for training the scene coordinate regression
model, we instead re-project 3D scene coordinates onto the 2D image plane and
sample within a local neighborhood of the re-projected points. While our
proposed sampling strategy is generally applicable, we showcase FocusTune by
integrating it with the recently introduced Accelerated Coordinate Encoding
(ACE) model. Our results demonstrate that FocusTune both improves or matches
state-of-the-art performance whilst keeping ACE's appealing low storage and
compute requirements, for example reducing translation error from 25 to 19 and
17 to 15 cm for single and ensemble models, respectively, on the Cambridge
Landmarks dataset. This combination of high performance and low compute and
storage requirements is particularly promising for applications in areas like
mobile robotics and augmented reality. We made our code available at
\url{https://github.com/sontung/focus-tune}.
Related papers
- SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality [50.179377002092416]
We propose an efficient visual localization method capable of high-quality rendering with fewer parameters.
Our method achieves superior or comparable rendering and localization performance to state-of-the-art implicit-based visual localization approaches.
arXiv Detail & Related papers (2024-09-21T08:46:16Z) - GLACE: Global Local Accelerated Coordinate Encoding [66.87005863868181]
Scene coordinate regression methods are effective in small-scale scenes but face significant challenges in large-scale scenes.
We propose GLACE, which integrates pre-trained global and local encodings and enables SCR to scale to large scenes with only a single small-sized network.
Our method achieves state-of-the-art results on large-scale scenes with a low-map-size model.
arXiv Detail & Related papers (2024-06-06T17:59:50Z) - Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression [1.2974519529978974]
This paper introduces a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF)
generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's capabilities in data-scarce environments.
The proposed system could significantly improve localization accuracy by up to 50% and cost only a fraction of time for data synthesis.
arXiv Detail & Related papers (2024-03-15T13:40:37Z) - Learning to Produce Semi-dense Correspondences for Visual Localization [11.415451542216559]
This study addresses the challenge of performing visual localization in demanding conditions such as night-time scenarios, adverse weather, and seasonal changes.
We propose a novel method that extracts reliable semi-dense 2D-3D matching points based on dense keypoint matches.
The network utilizes both geometric and visual cues to effectively infer 3D coordinates for unobserved keypoints from the observed ones.
arXiv Detail & Related papers (2024-02-13T10:40:10Z) - DV-Det: Efficient 3D Point Cloud Object Detection with Dynamic
Voxelization [0.0]
We propose a novel two-stage framework for the efficient 3D point cloud object detection.
We parse the raw point cloud data directly in the 3D space yet achieve impressive efficiency and accuracy.
We highlight our KITTI 3D object detection dataset with 75 FPS and on Open dataset with 25 FPS inference speed with satisfactory accuracy.
arXiv Detail & Related papers (2021-07-27T10:07:39Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement [54.29252286561449]
We propose a two-stage graph-based and model-agnostic framework, called Graph-PCNN.
In the first stage, heatmap regression network is applied to obtain a rough localization result, and a set of proposal keypoints, called guided points, are sampled.
In the second stage, for each guided point, different visual feature is extracted by the localization.
The relationship between guided points is explored by the graph pose refinement module to get more accurate localization results.
arXiv Detail & Related papers (2020-07-21T04:59:15Z) - Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking [34.40019455462043]
We propose a joint spatial-temporal optimization-based stereo 3D object tracking method.
From the network, we detect corresponding 2D bounding boxes on adjacent images and regress an initial 3D bounding box.
Dense object cues that associating to the object centroid are then predicted using a region-based network.
arXiv Detail & Related papers (2020-04-20T13:59:46Z) - Multi-View Optimization of Local Feature Geometry [70.18863787469805]
We address the problem of refining the geometry of local image features from multiple views without known scene or camera geometry.
Our proposed method naturally complements the traditional feature extraction and matching paradigm.
We show that our method consistently improves the triangulation and camera localization performance for both hand-crafted and learned local features.
arXiv Detail & Related papers (2020-03-18T17:22:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.