BEVPlace: Learning LiDAR-based Place Recognition using Bird's Eye View
Images
- URL: http://arxiv.org/abs/2302.14325v3
- Date: Tue, 15 Aug 2023 03:44:00 GMT
- Title: BEVPlace: Learning LiDAR-based Place Recognition using Bird's Eye View
Images
- Authors: Lun Luo, Shuhang Zheng, Yixuan Li, Yongzhi Fan, Beinan Yu, Siyuan Cao,
Huiliang Shen
- Abstract summary: We explore the potential of a different representation in place recognition, i.e. bird's eye view (BEV) images.
A simple VGGNet trained on BEV images achieves comparable performance with the state-of-the-art place recognition methods in scenes of slight viewpoint changes.
We develop a method to estimate the position of the query cloud, extending the usage of place recognition.
- Score: 20.30997801125592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Place recognition is a key module for long-term SLAM systems. Current
LiDAR-based place recognition methods usually use representations of point
clouds such as unordered points or range images. These methods achieve high
recall rates of retrieval, but their performance may degrade in the case of
view variation or scene changes. In this work, we explore the potential of a
different representation in place recognition, i.e. bird's eye view (BEV)
images. We observe that the structural contents of BEV images are less
influenced by rotations and translations of point clouds. We validate that,
without any delicate design, a simple VGGNet trained on BEV images achieves
comparable performance with the state-of-the-art place recognition methods in
scenes of slight viewpoint changes. For more robust place recognition, we
design a rotation-invariant network called BEVPlace. We use group convolution
to extract rotation-equivariant local features from the images and NetVLAD for
global feature aggregation. In addition, we observe that the distance between
BEV features is correlated with the geometry distance of point clouds. Based on
the observation, we develop a method to estimate the position of the query
cloud, extending the usage of place recognition. The experiments conducted on
large-scale public datasets show that our method 1) achieves state-of-the-art
performance in terms of recall rates, 2) is robust to view changes, 3) shows
strong generalization ability, and 4) can estimate the positions of query point
clouds. Source codes are publicly available at
https://github.com/zjuluolun/BEVPlace.
Related papers
- VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization [108.68014173017583]
Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car.
We propose to utilize a generative model similar to the Vector Quantized-Variational AutoEncoder (VQ-VAE) to acquire prior knowledge for the high-level BEV semantics in the tokenized discrete space.
Thanks to the obtained BEV tokens accompanied with a codebook embedding encapsulating the semantics for different BEV elements in the groundtruth maps, we are able to directly align the sparse backbone image features with the obtained BEV tokens
arXiv Detail & Related papers (2024-11-03T16:09:47Z) - Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - BEV$^2$PR: BEV-Enhanced Visual Place Recognition with Structural Cues [44.96177875644304]
We propose a new image-based visual place recognition (VPR) framework by exploiting the structural cues in bird's-eye view (BEV) from a single camera.
The BEV$2$PR framework generates a composite descriptor with both visual cues and spatial awareness based on a single camera.
arXiv Detail & Related papers (2024-03-11T10:46:43Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Leveraging BEV Representation for 360-degree Visual Place Recognition [14.497501941931759]
This paper investigates the advantages of using Bird's Eye View representation in 360-degree visual place recognition (VPR)
We propose a novel network architecture that utilizes the BEV representation in feature extraction, feature aggregation, and vision-LiDAR fusion.
The proposed BEV-based method is evaluated in ablation and comparative studies on two datasets.
arXiv Detail & Related papers (2023-05-23T08:29:42Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - I2P-Rec: Recognizing Images on Large-scale Point Cloud Maps through
Bird's Eye View Projections [18.7557037030769]
Place recognition is an important technique for autonomous cars to achieve full autonomy.
We propose the I2P-Rec method to solve the problem by transforming the cross-modal data into the same modality.
With only a small set of training data, I2P-Rec achieves recall rates at Top-1% over 80% and 90%, when localizing monocular and stereo images on point cloud maps.
arXiv Detail & Related papers (2023-03-02T07:56:04Z) - Delving into the Devils of Bird's-eye-view Perception: A Review,
Evaluation and Recipe [115.31507979199564]
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia.
As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.
The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios.
arXiv Detail & Related papers (2022-09-12T15:29:13Z) - Robust Place Recognition using an Imaging Lidar [45.37172889338924]
We propose a methodology for robust, real-time place recognition using an imaging lidar.
Our method is truly-invariant and can tackle reverse revisiting and upside-down revisiting.
arXiv Detail & Related papers (2021-03-03T01:08:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.