Related papers: CVTNet: A Cross-View Transformer Network for Place Recognition Using LiDAR Data

CVTNet: A Cross-View Transformer Network for Place Recognition Using LiDAR Data

URL: http://arxiv.org/abs/2302.01665v2
Date: Fri, 6 Oct 2023 06:26:34 GMT
Title: CVTNet: A Cross-View Transformer Network for Place Recognition Using LiDAR Data
Authors: Junyi Ma, Guangming Xiong, Jingyi Xu, Xieyuanli Chen
Abstract summary: We propose a cross-view transformer-based network, dubbedBITNet, to fuse the range image views (RIVs) and bird's eye views (BEVs) generated from the LiDAR data. We evaluate our approach on three datasets collected with different sensor setups and environmental conditions.
Score: 15.144590078316252
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LiDAR-based place recognition (LPR) is one of the most crucial components of autonomous vehicles to identify previously visited places in GPS-denied environments. Most existing LPR methods use mundane representations of the input point cloud without considering different views, which may not fully exploit the information from LiDAR sensors. In this paper, we propose a cross-view transformer-based network, dubbed CVTNet, to fuse the range image views (RIVs) and bird's eye views (BEVs) generated from the LiDAR data. It extracts correlations within the views themselves using intra-transformers and between the two different views using inter-transformers. Based on that, our proposed CVTNet generates a yaw-angle-invariant global descriptor for each laser scan end-to-end online and retrieves previously seen places by descriptor matching between the current query scan and the pre-built database. We evaluate our approach on three datasets collected with different sensor setups and environmental conditions. The experimental results show that our method outperforms the state-of-the-art LPR methods with strong robustness to viewpoint changes and long-time spans. Furthermore, our approach has a good real-time performance that can run faster than the typical LiDAR frame rate. The implementation of our method is released as open source at: https://github.com/BIT-MJY/CVTNet.

Related papers

Range and Bird's Eye View Fused Cross-Modal Visual Place Recognition [10.086473917830112]
Image-to-point cloud cross-modal Visual Place Recognition (VPR) is a challenging task where the query is an RGB image, and the database samples are LiDAR point clouds. We propose an innovative initial retrieval + re-rank method that effectively combines information from range (or RGB) images and Bird's Eye View (BEV) images.
arXiv Detail & Related papers (2025-02-17T12:29:26Z)
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection. RaCFormer achieves superior results of 64.9% mAP and 70.2% on nuScenes datasets.
arXiv Detail & Related papers (2024-12-17T09:47:48Z)
SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection [5.36022165180739]
We present SpaRC, a novel Sparse fusion transformer for 3D perception that integrates multi-view image semantics with Radar and Camera point features. Empirical evaluations on the nuScenes and TruckScenes benchmarks demonstrate that SpaRC significantly outperforms existing dense BEV-based and sparse query-based detectors.
arXiv Detail & Related papers (2024-11-29T17:17:38Z)
OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer [63.141027246418]
We propose Open-Vocabulary Light-Weighted Detection Transformer (OVLW-DETR), a deployment friendly open-vocabulary detector with strong performance and low latency. We provide an end-to-end training recipe that transferring knowledge from vision-language model (VLM) to object detector with simple alignment. Experimental results demonstrate that the proposed approach is superior over existing real-time open-vocabulary detectors on standard Zero-Shot LVIS benchmark.
arXiv Detail & Related papers (2024-07-15T12:15:27Z)
UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input [51.150605800173366]
UnLoc is a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions. Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets.
arXiv Detail & Related papers (2023-07-03T04:10:55Z)
PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer [75.2251801053839]
We present a novel Point-Voxel Transformer for single-stage 3D detection (PVT-SSD) We propose a Point-Voxel Transformer (PVT) module that obtains long-range contexts in a cheap manner from voxels. The experiments on several autonomous driving benchmarks verify the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2023-05-11T07:37:15Z)
SeqOT: A Spatial-Temporal Transformer Network for Place Recognition Using Sequential LiDAR Data [9.32516766412743]
We propose a transformer-based network named SeqOT to exploit the temporal and spatial information provided by sequential range images. We evaluate our approach on four datasets collected with different types of LiDAR sensors in different environments. Our method operates online faster than the frame rate of the sensor.
arXiv Detail & Related papers (2022-09-16T14:08:11Z)
Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes. We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z)
Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing Simulation-to-Real Domain Shift in LiDAR Bird's Eye View [110.83289076967895]
We present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process. The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark.
arXiv Detail & Related papers (2021-04-22T12:47:37Z)
MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion of LiDAR Data [4.8061970432391785]
We propose itMVFusenet, a novel end-to-end method for joint object detection motion forecasting from a temporal sequence of LiDAR data. We show the benefits of our multi-view approach for the tasks of detection and motion forecasting on two large-scale self-driving data sets.
arXiv Detail & Related papers (2021-04-21T21:29:08Z)
Spatiotemporal Transformer for Video-based Person Re-identification [102.58619642363958]
We show that, despite the strong learning ability, the vanilla Transformer suffers from an increased risk of over-fitting. We propose a novel pipeline where the model is pre-trained on a set of synthesized video data and then transferred to the downstream domains. The derived algorithm achieves significant accuracy gain on three popular video-based person re-identification benchmarks.
arXiv Detail & Related papers (2021-03-30T16:19:27Z)
Self-Supervised Adaptation for Video Super-Resolution [7.26562478548988]
Single-image super-resolution (SISR) networks can adapt their network parameters to specific input images. We present a new learning algorithm that allows conventional video super-resolution (VSR) networks to adapt their parameters to test video frames.
arXiv Detail & Related papers (2021-03-18T08:30:24Z)
Multi-View Fusion of Sensor Data for Improved Perception and Prediction in Autonomous Driving [11.312620949473938]
We present an end-to-end method for object detection and trajectory prediction utilizing multi-view representations of LiDAR and camera images. Our model builds on a state-of-the-art Bird's-Eye View (BEV) network that fuses voxelized features from a sequence of historical LiDAR data. We extend this model with additional LiDAR Range-View (RV) features that use the raw LiDAR information in its native, non-quantized representation.
arXiv Detail & Related papers (2020-08-27T03:32:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.