NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera
Localization
- URL: http://arxiv.org/abs/2211.11177v2
- Date: Sun, 26 Mar 2023 06:22:15 GMT
- Title: NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera
Localization
- Authors: Shitao Tang, Sicong Tang, Andrea Tagliasacchi, Ping Tan and Yasutaka
Furukawa
- Abstract summary: NeuMap is an end-to-end neural mapping method for camera localization.
It encodes a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels.
- Score: 60.73541222862195
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents an end-to-end neural mapping method for camera
localization, dubbed NeuMap, encoding a whole scene into a grid of latent
codes, with which a Transformer-based auto-decoder regresses 3D coordinates of
query pixels. State-of-the-art feature matching methods require each scene to
be stored as a 3D point cloud with per-point features, consuming several
gigabytes of storage per scene. While compression is possible, performance
drops significantly at high compression rates. Conversely, coordinate
regression methods achieve high compression by storing scene information in a
neural network but suffer from reduced robustness. NeuMap combines the
advantages of both approaches by utilizing 1) learnable latent codes for
efficient scene representation and 2) a scene-agnostic Transformer-based
auto-decoder to infer coordinates for query pixels. This scene-agnostic network
design learns robust matching priors from large-scale data and enables rapid
optimization of codes for new scenes while keeping the network weights fixed.
Extensive evaluations on five benchmarks show that NeuMap significantly
outperforms other coordinate regression methods and achieves comparable
performance to feature matching methods while requiring a much smaller scene
representation size. For example, NeuMap achieves 39.1% accuracy in the Aachen
night benchmark with only 6MB of data, whereas alternative methods require
100MB or several gigabytes and fail completely under high compression settings.
The codes are available at https://github.com/Tangshitao/NeuMap
Related papers
- Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations.
It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks.
We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z) - SACReg: Scene-Agnostic Coordinate Regression for Visual Localization [16.866303169903237]
We propose a generalized SCR model trained once in new test scenes, regardless of their scale, without any finetuning.
Instead of encoding the scene coordinates into the network weights, our model takes as input a database image with some sparse 2D pixel to 3D coordinate annotations.
We show that the database representation of images and their 2D-3D annotations can be highly compressed with negligible loss of localization performance.
arXiv Detail & Related papers (2023-07-21T16:56:36Z) - CoordFill: Efficient High-Resolution Image Inpainting via Parameterized
Coordinate Querying [52.91778151771145]
In this paper, we try to break the limitations for the first time thanks to the recent development of continuous implicit representation.
Experiments show that the proposed method achieves real-time performance on the 2048$times$2048 images using a single GTX 2080 Ti GPU.
arXiv Detail & Related papers (2023-03-15T11:13:51Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information.
Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z) - COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities.
We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z) - VS-Net: Voting with Segmentation for Visual Localization [72.8165619061249]
We propose a novel visual localization framework that establishes 2D-to-3D correspondences between the query image and the 3D map with a series of learnable scene-specific landmarks.
Our proposed VS-Net is extensively tested on multiple public benchmarks and can outperform state-of-the-art visual localization methods.
arXiv Detail & Related papers (2021-05-23T08:44:11Z) - Learning Camera Localization via Dense Scene Matching [45.0957383562443]
Camera localization aims to estimate 6 DoF camera poses from RGB images.
Recent learning-based approaches encode structures into a specific convolutional neural network (CNN)
We present a new method for camera localization using dense matching (DSM)
arXiv Detail & Related papers (2021-03-31T03:47:42Z) - Efficient Scene Compression for Visual-based Localization [5.575448433529451]
Estimating the pose of a camera with respect to a 3D reconstruction or scene representation is a crucial step for many mixed reality and robotics applications.
This work introduces a novel approach that compresses a scene representation by means of a constrained quadratic program (QP)
Our experiments on publicly available datasets show that our approach compresses a scene representation quickly while delivering accurate pose estimates.
arXiv Detail & Related papers (2020-11-27T18:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.