LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for
Place Recognition
- URL: http://arxiv.org/abs/2311.03198v2
- Date: Sat, 30 Dec 2023 06:39:46 GMT
- Title: LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for
Place Recognition
- Authors: Zijie Zhou, Jingyi Xu, Guangming Xiong, Junyi Ma
- Abstract summary: We present a novel neural network named LCPR for robust multimodal place recognition.
Our method can effectively utilize multi-view camera and LiDAR data to improve the place recognition performance.
- Score: 11.206532393178385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Place recognition is one of the most crucial modules for autonomous vehicles
to identify places that were previously visited in GPS-invalid environments.
Sensor fusion is considered an effective method to overcome the weaknesses of
individual sensors. In recent years, multimodal place recognition fusing
information from multiple sensors has gathered increasing attention. However,
most existing multimodal place recognition methods only use limited
field-of-view camera images, which leads to an imbalance between features from
different modalities and limits the effectiveness of sensor fusion. In this
paper, we present a novel neural network named LCPR for robust multimodal place
recognition, which fuses LiDAR point clouds with multi-view RGB images to
generate discriminative and yaw-rotation invariant representations of the
environment. A multi-scale attention-based fusion module is proposed to fully
exploit the panoramic views from different modalities of the environment and
their correlations. We evaluate our method on the nuScenes dataset, and the
experimental results show that our method can effectively utilize multi-view
camera and LiDAR data to improve the place recognition performance while
maintaining strong robustness to viewpoint changes. Our open-source code and
pre-trained models are available at https://github.com/ZhouZijie77/LCPR .
Related papers
- Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving [3.770103075126785]
We introduce a novel approach to multi-modal sensor fusion, focusing on developing a graph-based state representation.
We present a Sensor-Agnostic Graph-Aware Kalman Filter, the first online state estimation technique designed to fuse multi-modal graphs.
We validate the effectiveness of our proposed framework through extensive experiments conducted on both synthetic and real-world driving datasets.
arXiv Detail & Related papers (2024-11-06T06:58:17Z) - GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving [9.023864430027333]
multimodal place recognition has gained increasing attention due to their ability to overcome weaknesses of uni sensor systems.
We propose a 3D Gaussian-based multimodal place recognition neural network dubbed GSPR.
arXiv Detail & Related papers (2024-10-01T00:43:45Z) - Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble [15.173314907900842]
Existing 3D object detection methods rely heavily on the LiDAR sensor.
We propose MEFormer to address the LiDAR over-reliance problem.
Our MEFormer achieves state-of-the-art performance of 73.9% NDS and 71.5% mAP in the nuScenes validation set.
arXiv Detail & Related papers (2024-07-27T03:21:44Z) - MSSPlace: Multi-Sensor Place Recognition with Visual and Text Semantics [41.94295877935867]
We study the impact of leveraging a multi-camera setup and integrating diverse data sources for multimodal place recognition.
Our proposed method named MSSPlace utilizes images from multiple cameras, LiDAR point clouds, semantic segmentation masks, and text annotations to generate comprehensive place descriptors.
arXiv Detail & Related papers (2024-07-22T14:24:56Z) - Log-Likelihood Score Level Fusion for Improved Cross-Sensor Smartphone
Periocular Recognition [52.15994166413364]
We employ fusion of several comparators to improve periocular performance when images from different smartphones are compared.
We use a probabilistic fusion framework based on linear logistic regression, in which fused scores tend to be log-likelihood ratios.
Our framework also provides an elegant and simple solution to handle signals from different devices, since same-sensor and cross-sensor score distributions are aligned and mapped to a common probabilistic domain.
arXiv Detail & Related papers (2023-11-02T13:43:44Z) - Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images [1.662438436885552]
Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities.
We propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage.
By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-10-21T00:56:11Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Infrared Small-Dim Target Detection with Transformer under Complex
Backgrounds [155.388487263872]
We propose a new infrared small-dim target detection method with the transformer.
We adopt the self-attention mechanism of the transformer to learn the interaction information of image features in a larger range.
We also design a feature enhancement module to learn more features of small-dim targets.
arXiv Detail & Related papers (2021-09-29T12:23:41Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.