VS-Net: Voting with Segmentation for Visual Localization
- URL: http://arxiv.org/abs/2105.10886v1
- Date: Sun, 23 May 2021 08:44:11 GMT
- Title: VS-Net: Voting with Segmentation for Visual Localization
- Authors: Zhaoyang Huang, Han Zhou, Yijin Li, Bangbang Yang, Yan Xu, Xiaowei
Zhou, Hujun Bao, Guofeng Zhang, Hongsheng Li
- Abstract summary: We propose a novel visual localization framework that establishes 2D-to-3D correspondences between the query image and the 3D map with a series of learnable scene-specific landmarks.
Our proposed VS-Net is extensively tested on multiple public benchmarks and can outperform state-of-the-art visual localization methods.
- Score: 72.8165619061249
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Visual localization is of great importance in robotics and computer vision.
Recently, scene coordinate regression based methods have shown good performance
in visual localization in small static scenes. However, it still estimates
camera poses from many inferior scene coordinates. To address this problem, we
propose a novel visual localization framework that establishes 2D-to-3D
correspondences between the query image and the 3D map with a series of
learnable scene-specific landmarks. In the landmark generation stage, the 3D
surfaces of the target scene are over-segmented into mosaic patches whose
centers are regarded as the scene-specific landmarks. To robustly and
accurately recover the scene-specific landmarks, we propose the Voting with
Segmentation Network (VS-Net) to segment the pixels into different landmark
patches with a segmentation branch and estimate the landmark locations within
each patch with a landmark location voting branch. Since the number of
landmarks in a scene may reach up to 5000, training a segmentation network with
such a large number of classes is both computation and memory costly for the
commonly used cross-entropy loss. We propose a novel prototype-based triplet
loss with hard negative mining, which is able to train semantic segmentation
networks with a large number of labels efficiently. Our proposed VS-Net is
extensively tested on multiple public benchmarks and can outperform
state-of-the-art visual localization methods. Code and models are available at
\href{https://github.com/zju3dv/VS-Net}{https://github.com/zju3dv/VS-Net}.
Related papers
- Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations.
It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks.
We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z) - SACReg: Scene-Agnostic Coordinate Regression for Visual Localization [16.866303169903237]
We propose a generalized SCR model trained once in new test scenes, regardless of their scale, without any finetuning.
Instead of encoding the scene coordinates into the network weights, our model takes as input a database image with some sparse 2D pixel to 3D coordinate annotations.
We show that the database representation of images and their 2D-3D annotations can be highly compressed with negligible loss of localization performance.
arXiv Detail & Related papers (2023-07-21T16:56:36Z) - HSCNet++: Hierarchical Scene Coordinate Classification and Regression
for Visual Localization with Transformer [23.920690073252636]
We present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image.
The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments.
arXiv Detail & Related papers (2023-05-05T15:00:14Z) - SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications.
We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial.
We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Visual Localization via Few-Shot Scene Region Classification [84.34083435501094]
Visual (re)localization addresses the problem of estimating the 6-DoF camera pose of a query image captured in a known scene.
Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates.
We propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images.
arXiv Detail & Related papers (2022-08-14T22:39:02Z) - Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions [94.17683799712397]
We focus on scene graphs, a data structure that organizes the entities of a scene in a graph.
We propose a learned method that regresses a scene graph from the point cloud of a scene.
We show the application of our method in a domain-agnostic retrieval task, where graphs serve as an intermediate representation for 3D-3D and 2D-3D matching.
arXiv Detail & Related papers (2020-04-08T12:25:25Z) - Depth Based Semantic Scene Completion with Position Importance Aware
Loss [52.06051681324545]
PALNet is a novel hybrid network for semantic scene completion.
It extracts both 2D and 3D features from multi-stages using fine-grained depth information.
It is beneficial for recovering key details like the boundaries of objects and the corners of the scene.
arXiv Detail & Related papers (2020-01-29T07:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.