Related papers: NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization

NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization

URL: http://arxiv.org/abs/2211.11177v2
Date: Sun, 26 Mar 2023 06:22:15 GMT
Title: NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization
Authors: Shitao Tang, Sicong Tang, Andrea Tagliasacchi, Ping Tan and Yasutaka Furukawa
Abstract summary: NeuMap is an end-to-end neural mapping method for camera localization. It encodes a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels.
Score: 60.73541222862195
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents an end-to-end neural mapping method for camera localization, dubbed NeuMap, encoding a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels. State-of-the-art feature matching methods require each scene to be stored as a 3D point cloud with per-point features, consuming several gigabytes of storage per scene. While compression is possible, performance drops significantly at high compression rates. Conversely, coordinate regression methods achieve high compression by storing scene information in a neural network but suffer from reduced robustness. NeuMap combines the advantages of both approaches by utilizing 1) learnable latent codes for efficient scene representation and 2) a scene-agnostic Transformer-based auto-decoder to infer coordinates for query pixels. This scene-agnostic network design learns robust matching priors from large-scale data and enables rapid optimization of codes for new scenes while keeping the network weights fixed. Extensive evaluations on five benchmarks show that NeuMap significantly outperforms other coordinate regression methods and achieves comparable performance to feature matching methods while requiring a much smaller scene representation size. For example, NeuMap achieves 39.1% accuracy in the Aachen night benchmark with only 6MB of data, whereas alternative methods require 100MB or several gigabytes and fail completely under high compression settings. The codes are available at https://github.com/Tangshitao/NeuMap

Related papers

REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders [52.61034140869803]
Region Network (REN) is a fast and effective model for generating region-based image representations using point prompts.<n>REN bypasses this bottleneck using a lightweight module that directly generates region tokens.<n>It uses a few cross-attention blocks that take point prompts as queries and features from a patch-based image encoder as keys and values to produce region tokens.
arXiv Detail & Related papers (2025-05-23T17:59:33Z)
R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization [66.87005863868181]
We introduce a covisibility graph-based global encoding learning and data augmentation strategy. We revisit the network architecture and local feature extraction module. Our method achieves state-of-the-art on challenging large-scale datasets without relying on network ensembles or 3D supervision.
arXiv Detail & Related papers (2025-01-02T18:59:08Z)
Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations. It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks. We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z)
SACReg: Scene-Agnostic Coordinate Regression for Visual Localization [16.866303169903237]
We propose a generalized SCR model trained once in new test scenes, regardless of their scale, without any finetuning. Instead of encoding the scene coordinates into the network weights, our model takes as input a database image with some sparse 2D pixel to 3D coordinate annotations. We show that the database representation of images and their 2D-3D annotations can be highly compressed with negligible loss of localization performance.
arXiv Detail & Related papers (2023-07-21T16:56:36Z)
CoordFill: Efficient High-Resolution Image Inpainting via Parameterized Coordinate Querying [52.91778151771145]
In this paper, we try to break the limitations for the first time thanks to the recent development of continuous implicit representation. Experiments show that the proposed method achieves real-time performance on the 2048$times$2048 images using a single GTX 2080 Ti GPU.
arXiv Detail & Related papers (2023-03-15T11:13:51Z)
Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes. We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z)
Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance. To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information. Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z)
COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities. We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z)
VS-Net: Voting with Segmentation for Visual Localization [72.8165619061249]
We propose a novel visual localization framework that establishes 2D-to-3D correspondences between the query image and the 3D map with a series of learnable scene-specific landmarks. Our proposed VS-Net is extensively tested on multiple public benchmarks and can outperform state-of-the-art visual localization methods.
arXiv Detail & Related papers (2021-05-23T08:44:11Z)
Learning Camera Localization via Dense Scene Matching [45.0957383562443]
Camera localization aims to estimate 6 DoF camera poses from RGB images. Recent learning-based approaches encode structures into a specific convolutional neural network (CNN) We present a new method for camera localization using dense matching (DSM)
arXiv Detail & Related papers (2021-03-31T03:47:42Z)
Efficient Scene Compression for Visual-based Localization [5.575448433529451]
Estimating the pose of a camera with respect to a 3D reconstruction or scene representation is a crucial step for many mixed reality and robotics applications. This work introduces a novel approach that compresses a scene representation by means of a constrained quadratic program (QP) Our experiments on publicly available datasets show that our approach compresses a scene representation quickly while delivering accurate pose estimates.
arXiv Detail & Related papers (2020-11-27T18:36:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.