Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression
- URL: http://arxiv.org/abs/2403.10297v2
- Date: Wed, 20 Mar 2024 02:34:27 GMT
- Title: Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression
- Authors: Huy-Hoang Bui, Bach-Thuan Bui, Dinh-Tuan Tran, Joo-Ho Lee,
- Abstract summary: This paper introduces a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF)
generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's capabilities in data-scarce environments.
The proposed system could significantly improve localization accuracy by up to 50% and cost only a fraction of time for data synthesis.
- Score: 1.2974519529978974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classical structural-based visual localization methods offer high accuracy but face trade-offs in terms of storage, speed, and privacy. A recent innovation, keypoint scene coordinate regression (KSCR) named D2S addresses these issues by leveraging graph attention networks to enhance keypoint relationships and predict their 3D coordinates using a simple multilayer perceptron (MLP). Camera pose is then determined via PnP+RANSAC, using established 2D-3D correspondences. While KSCR achieves competitive results, rivaling state-of-the-art image-retrieval methods like HLoc across multiple benchmarks, its performance is hindered when data samples are limited due to the deep learning model's reliance on extensive data. This paper proposes a solution to this challenge by introducing a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF). By generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's generalization capabilities in data-scarce environments. The proposed system could significantly improve localization accuracy by up to 50% and cost only a fraction of time for data synthesis. Furthermore, its modular design allows for the integration of multiple NeRFs, offering a versatile and efficient solution for visual localization. The implementation is publicly available at: https://github.com/ais-lab/DescriptorSynthesis4Feat2Map.
Related papers
- GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting.
We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space.
Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z) - GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving [9.023864430027333]
multimodal place recognition has gained increasing attention due to their ability to overcome weaknesses of uni sensor systems.
We propose a 3D Gaussian-based multimodal place recognition neural network dubbed GSPR.
arXiv Detail & Related papers (2024-10-01T00:43:45Z) - SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization [16.460851701725392]
We present a novel approach that optimize radiance fields with scene graphs to mitigate the influence of outlier poses.
Our method incorporates an adaptive inlier-outlier confidence estimation scheme based on scene graphs.
We also introduce an effective intersection-over-union (IoU) loss to optimize the camera pose and surface geometry.
arXiv Detail & Related papers (2024-07-17T15:50:17Z) - PNeRFLoc: Visual Localization with Point-based Neural Radiance Fields [54.8553158441296]
We propose a novel visual localization framework, ie, PNeRFLoc, based on a unified point-based representation.
On the one hand, PNeRFLoc supports the initial pose estimation by matching 2D and 3D feature points.
On the other hand, it also enables pose refinement with novel view synthesis using rendering-based optimization.
arXiv Detail & Related papers (2023-12-17T08:30:00Z) - FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting [58.41056963451056]
We propose a few-shot view synthesis framework based on 3D Gaussian Splatting.
This framework enables real-time and photo-realistic view synthesis with as few as three training views.
FSGS achieves state-of-the-art performance in both accuracy and rendering efficiency across diverse datasets.
arXiv Detail & Related papers (2023-12-01T09:30:02Z) - Leveraging Neural Radiance Fields for Uncertainty-Aware Visual
Localization [56.95046107046027]
We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for scene coordinate regression.
Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain.
arXiv Detail & Related papers (2023-10-10T20:11:13Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - PeRFception: Perception using Radiance Fields [72.99583614735545]
We create the first large-scale implicit representation datasets for perception tasks, called the PeRFception.
It shows a significant memory compression rate (96.4%) from the original dataset, while containing both 2D and 3D information in a unified form.
We construct the classification and segmentation models that directly take as input this implicit format and also propose a novel augmentation technique to avoid overfitting on backgrounds of images.
arXiv Detail & Related papers (2022-08-24T13:32:46Z) - EV-VGCNN: A Voxel Graph CNN for Event-based Object Classification [18.154951807178943]
Event cameras report sparse intensity changes and hold noticeable advantages of low power consumption, high dynamic range, and high response speed for visual perception and understanding on portable devices.
Event-based learning methods have recently achieved massive success on object recognition by integrating events into dense frame-based representations to apply traditional 2D learning algorithms.
These approaches introduce much redundant information during the sparse-to-dense conversion and necessitate models with heavy-weight and large capacities, limiting the potential of event cameras on real-life applications.
arXiv Detail & Related papers (2021-06-01T04:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.