Enhancing Scene Coordinate Regression with Efficient Keypoint Detection and Sequential Information
- URL: http://arxiv.org/abs/2412.06488v2
- Date: Tue, 13 May 2025 05:08:50 GMT
- Title: Enhancing Scene Coordinate Regression with Efficient Keypoint Detection and Sequential Information
- Authors: Kuan Xu, Zeyu Jiang, Haozhi Cao, Shenghai Yuan, Chen Wang, Lihua Xie,
- Abstract summary: We propose an efficient and accurate Scene Coordinate Regression (SCR) system.<n>Compared to existing SCR methods, we propose a unified architecture for both scene encoding and salient keypoint detection.<n> Comprehensive experiments conducted across indoor and outdoor datasets demonstrate that the proposed system outperforms state-of-the-art (SOTA) SCR methods.
- Score: 26.934946734751442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene Coordinate Regression (SCR) is a visual localization technique that utilizes deep neural networks (DNN) to directly regress 2D-3D correspondences for camera pose estimation. However, current SCR methods often face challenges in handling repetitive textures and meaningless areas due to their reliance on implicit triangulation. In this paper, we propose an efficient and accurate SCR system. Compared to existing SCR methods, we propose a unified architecture for both scene encoding and salient keypoint detection, allowing our system to prioritize the encoding of informative regions. This design significantly improves computational efficiency. Additionally, we introduce a mechanism that utilizes sequential information during both mapping and relocalization. The proposed method enhances the implicit triangulation, especially in environments with repetitive textures. Comprehensive experiments conducted across indoor and outdoor datasets demonstrate that the proposed system outperforms state-of-the-art (SOTA) SCR methods. Our single-frame relocalization mode improves the recall rate of our baseline by 6.4% and increases the running speed from 56Hz to 90Hz. Furthermore, our sequence-based mode increases the recall rate by 11% while maintaining the original efficiency.
Related papers
- QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization [69.50126552763157]
Surface reconstruction is fundamental to computer vision and graphics, enabling applications in 3D modeling, mixed reality, robotics, and more.<n>Existing approaches based on rendering obtain promising results, but optimize on a per-scene basis, resulting in a slow optimization that can struggle to model textureless regions.<n>We introduce QuickSplat, which learns data-driven priors to generate dense initializations for 2D gaussian splatting optimization of large-scale indoor scenes.
arXiv Detail & Related papers (2025-05-08T18:43:26Z) - DERD-Net: Learning Depth from Event-based Ray Densities [11.309936820480111]
Event cameras offer a promising avenue for multi-view stereo depth estimation and SLAM.
We propose a scalable, flexible and adaptable framework for pixel-wise depth estimation with event cameras in both monocular and stereo setups.
arXiv Detail & Related papers (2025-04-22T12:58:05Z) - FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction [50.534213038479926]
FreeSplat++ is an alternative approach to large-scale indoor whole-scene reconstruction.
Our method with depth-regularized per-scene fine-tuning demonstrates substantial improvements in reconstruction accuracy and a notable reduction in training time.
arXiv Detail & Related papers (2025-03-29T06:22:08Z) - PoI: A Filter to Extract Pixel of Interest from Novel View Synthesis for Scene Coordinate Regression [28.39136566857838]
Novel View Synthesis (NVS) techniques can augment camera pose estimation by extending and diversifying training data.<n>Images generated by these methods are often plagued by spatial artifacts such as blurring and ghosting.<n>We propose a dual-criteria filtering mechanism that dynamically identifies and discards suboptimal pixels during training.
arXiv Detail & Related papers (2025-02-07T11:24:23Z) - R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization [66.87005863868181]
We introduce a covisibility graph-based global encoding learning and data augmentation strategy.
We revisit the network architecture and local feature extraction module.
Our method achieves state-of-the-art on challenging large-scale datasets without relying on network ensembles or 3D supervision.
arXiv Detail & Related papers (2025-01-02T18:59:08Z) - Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval [16.995114000869833]
We propose CMPAGL, a cross-modal pre-aligned method leveraging global and local information.
Our Gswin transformer block combines local window self-attention and global-local window cross-attention to capture multi-scale features.
Experiments on four datasets, including RSICD and RSITMD, validate CMPAGL's effectiveness.
arXiv Detail & Related papers (2024-11-22T03:28:55Z) - GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization [1.4466437171584356]
We propose a two-stage procedure that integrates dense and robust keypoint descriptors from the lightweight XFeat feature extractor into 3DGS.
In the second stage, the initial pose estimate is refined by minimizing the rendering-based photometric warp loss.
Benchmarking on widely used indoor and outdoor datasets demonstrates improvements over recent neural rendering-based localization methods.
arXiv Detail & Related papers (2024-09-24T23:18:32Z) - HGSLoc: 3DGS-based Heuristic Camera Pose Refinement [13.393035855468428]
Visual localization refers to the process of determining camera poses and orientation within a known scene representation.
In this paper, we propose HGSLoc, which integrates 3D reconstruction with a refinement strategy to achieve higher pose estimation accuracy.
Our method demonstrates a faster rendering speed and higher localization accuracy compared to NeRF-based neural rendering approaches.
arXiv Detail & Related papers (2024-09-17T06:48:48Z) - VHS: High-Resolution Iterative Stereo Matching with Visual Hull Priors [3.523208537466128]
We present a stereo-matching method for depth estimation from high-resolution images using visual hulls as priors.
Our method uses object masks extracted from supplementary views of the scene to guide the disparity estimation, effectively reducing the search space for matches.
This approach is specifically tailored to stereo rigs in volumetric capture systems, where an accurate depth plays a key role in the downstream reconstruction task.
arXiv Detail & Related papers (2024-06-04T17:59:57Z) - Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression [1.2974519529978974]
This paper introduces a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF)
generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's capabilities in data-scarce environments.
The proposed system could significantly improve localization accuracy by up to 50% and cost only a fraction of time for data synthesis.
arXiv Detail & Related papers (2024-03-15T13:40:37Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting [51.96353586773191]
We introduce textbfGS-SLAM that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping system.
Our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D rendering.
Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets.
arXiv Detail & Related papers (2023-11-20T12:08:23Z) - Leveraging Neural Radiance Fields for Uncertainty-Aware Visual
Localization [56.95046107046027]
We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for scene coordinate regression.
Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain.
arXiv Detail & Related papers (2023-10-10T20:11:13Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.