ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition
- URL: http://arxiv.org/abs/2403.18762v1
- Date: Wed, 27 Mar 2024 17:01:10 GMT
- Title: ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition
- Authors: Weidong Xie, Lun Luo, Nanfei Ye, Yi Ren, Shaoyi Du, Minhang Wang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli Chen,
- Abstract summary: We introduce a fast and lightweight framework to encode images and point clouds into place-distinctive descriptors.
We propose an effective Field of View (FoV) transformation module to convert point clouds into an analogous modality as images.
We also design a non-negative factorization-based encoder to extract mutually consistent semantic features between point clouds and images.
- Score: 16.799067323119644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation for modality conversion, which are usually computationally intensive and need expensive labeled data for depth supervision. In this work, we introduce a fast and lightweight framework to encode images and point clouds into place-distinctive descriptors. We propose an effective Field of View (FoV) transformation module to convert point clouds into an analogous modality as images. This module eliminates the necessity for depth estimation and helps subsequent modules achieve real-time performance. We further design a non-negative factorization-based encoder to extract mutually consistent semantic features between point clouds and images. This encoder yields more distinctive global descriptors for retrieval. Experimental results on the KITTI dataset show that our proposed methods achieve state-of-the-art performance while running in real time. Additional evaluation on the HAOMO dataset covering a 17 km trajectory further shows the practical generalization capabilities. We have released the implementation of our methods as open source at: https://github.com/haomo-ai/ModaLink.git.
Related papers
- FASTC: A Fast Attentional Framework for Semantic Traversability Classification Using Point Cloud [7.711666704468952]
We address the problem of traversability assessment using point clouds.
We propose a pillar feature extraction module that utilizes PointNet to capture features from point clouds organized in vertical volume.
We then propose a newtemporal attention module to fuse multi-frame information, which can properly handle the varying density problem of LIDAR point clouds.
arXiv Detail & Related papers (2024-06-24T12:01:55Z) - I2P-Rec: Recognizing Images on Large-scale Point Cloud Maps through
Bird's Eye View Projections [18.7557037030769]
Place recognition is an important technique for autonomous cars to achieve full autonomy.
We propose the I2P-Rec method to solve the problem by transforming the cross-modal data into the same modality.
With only a small set of training data, I2P-Rec achieves recall rates at Top-1% over 80% and 90%, when localizing monocular and stereo images on point cloud maps.
arXiv Detail & Related papers (2023-03-02T07:56:04Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Let Images Give You More:Point Cloud Cross-Modal Training for Shape
Analysis [43.13887916301742]
This paper introduces a simple but effective point cloud cross-modality training (PointCMT) strategy to boost point cloud analysis.
To effectively acquire auxiliary knowledge from view images, we develop a teacher-student framework and formulate the cross modal learning as a knowledge distillation problem.
We verify significant gains on various datasets using appealing backbones, i.e., equipped with PointCMT, PointNet++ and PointMLP.
arXiv Detail & Related papers (2022-10-09T09:35:22Z) - Paint and Distill: Boosting 3D Object Detection with Semantic Passing
Network [70.53093934205057]
3D object detection task from lidar or camera sensors is essential for autonomous driving.
We propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models.
arXiv Detail & Related papers (2022-07-12T12:35:34Z) - VPIT: Real-time Embedded Single Object 3D Tracking Using Voxel Pseudo Images [90.60881721134656]
We propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT)
Experiments on KITTI Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values.
arXiv Detail & Related papers (2022-06-06T14:02:06Z) - SeqNetVLAD vs PointNetVLAD: Image Sequence vs 3D Point Clouds for
Day-Night Place Recognition [31.714928102950594]
Place Recognition is a crucial capability for mobile robot localization and navigation.
Recent VPR methods based on sequential representations'' have shown promising results.
We compare a 3D point cloud based method with image sequence based methods.
arXiv Detail & Related papers (2021-06-22T02:05:32Z) - Robust Place Recognition using an Imaging Lidar [45.37172889338924]
We propose a methodology for robust, real-time place recognition using an imaging lidar.
Our method is truly-invariant and can tackle reverse revisiting and upside-down revisiting.
arXiv Detail & Related papers (2021-03-03T01:08:31Z) - Segment as Points for Efficient Online Multi-Object Tracking and
Segmentation [66.03023110058464]
We propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation.
Our method generates a new tracking-by-points paradigm where discriminative instance embeddings are learned from randomly selected points rather than images.
The resulting online MOTS framework, named PointTrack, surpasses all the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2020-07-03T08:29:35Z) - Image Matching across Wide Baselines: From Paper to Practice [80.9424750998559]
We introduce a comprehensive benchmark for local features and robust estimation algorithms.
Our pipeline's modular structure allows easy integration, configuration, and combination of different methods.
We show that with proper settings, classical solutions may still outperform the perceived state of the art.
arXiv Detail & Related papers (2020-03-03T15:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.