Related papers: CORAL: Colored structural representation for bi-modal place recognition

CORAL: Colored structural representation for bi-modal place recognition

URL: http://arxiv.org/abs/2011.10934v2
Date: Mon, 19 Jul 2021 11:56:04 GMT
Title: CORAL: Colored structural representation for bi-modal place recognition
Authors: Yiyuan Pan, Xuecheng Xu, Weijie Li, Yunxiang Cui, Yue Wang, Rong Xiong
Abstract summary: We propose a bi-modal place recognition method, which can extract a compound global descriptor from the two modalities, vision and LiDAR. Specifically, we first build the elevation image generated from 3D points as a structural representation. Then, we derive the correspondences between 3D points and image pixels that are further used in merging the pixel-wise visual features into the elevation map grids.
Score: 12.357478978433814
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Place recognition is indispensable for a drift-free localization system. Due to the variations of the environment, place recognition using single-modality has limitations. In this paper, we propose a bi-modal place recognition method, which can extract a compound global descriptor from the two modalities, vision and LiDAR. Specifically, we first build the elevation image generated from 3D points as a structural representation. Then, we derive the correspondences between 3D points and image pixels that are further used in merging the pixel-wise visual features into the elevation map grids. In this way, we fuse the structural features and visual features in the consistent bird-eye view frame, yielding a semantic representation, namely CORAL. And the whole network is called CORAL-VLAD. Comparisons on the Oxford RobotCar show that CORAL-VLAD has superior performance against other state-of-the-art methods. We also demonstrate that our network can be generalized to other scenes and sensor configurations on cross-city datasets.

Related papers

NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction [99.52487968452198]
NOVA3R is an effective approach for non-pixel-aligned 3D reconstruction from a set of unposed images in a feed-forward manner.<n>It produces physically plausible geometry with fewer duplicated structures in overlapping regions.<n>It outperforms state-of-the-art methods in terms of reconstruction accuracy and completeness.
arXiv Detail & Related papers (2026-03-04T15:36:25Z)
Top2Ground: A Height-Aware Dual Conditioning Diffusion Model for Robust Aerial-to-Ground View Generation [14.377332218510743]
Top2Ground is a novel diffusion-based method that directly generates ground-view images from aerial input images.<n>We condition the denoising process on a joint representation of VAE-encoded spatial features.<n>Top2Ground can robustly handle both wide and narrow fields of view, highlighting its strong generalization capabilities.
arXiv Detail & Related papers (2025-11-11T13:53:07Z)
Cross-Modal Geometric Hierarchy Fusion: An Implicit-Submap Driven Framework for Resilient 3D Place Recognition [4.196626042312499]
We propose a novel framework that redefines 3D place recognition through density-agnostic geometric reasoning.<n>Specifically, we introduce an implicit 3D representation based on elastic points, which is immune to the interference of original scene point cloud density.<n>With the aid of these two types of information, we obtain descriptors that fuse geometric information from both bird's-eye view and 3D segment perspectives.
arXiv Detail & Related papers (2025-06-17T07:04:07Z)
Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching [2.400446821380503]
We introduce an efficient framework to learn descriptors for both RGB images and point clouds. It takes visual state space model (VMamba) as the backbone and employs a pixel-view-scene joint training strategy. A visible 3D points overlap strategy is then designed to quantify the similarity between point cloud views and RGB images for multi-view supervision.
arXiv Detail & Related papers (2024-10-08T18:31:41Z)
Context and Geometry Aware Voxel Transformer for Semantic Scene Completion [7.147020285382786]
Vision-based Semantic Scene Completion (SSC) has gained much attention due to its widespread applications in various 3D perception tasks. Existing sparse-to-dense approaches typically employ shared context-independent queries across various input images. We introduce a neural network named CGFormer to achieve semantic scene completion.
arXiv Detail & Related papers (2024-05-22T14:16:30Z)
VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition [40.603362112697255]
We propose a novel Voxel-Cross-Pixel (VXP) approach to perform accurate image-LiDAR global place recognition. VXP is trained in a two-stage manner that first explicitly exploits local feature correspondences and enforces similarity of global descriptors. Our method surpasses the state-of-the-art cross-modal retrieval by a large margin.
arXiv Detail & Related papers (2024-03-21T17:49:26Z)
Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes. We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data. We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z)
Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching [58.10418136917358]
Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic training. Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks. We learn a structured cross-modality matching solver to represent 3D features via a different latent pixel space.
arXiv Detail & Related papers (2023-12-07T05:46:10Z)
Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology. Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z)
Vision Transformer for NeRF-Based View Synthesis from a Single Input Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation. To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering. Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z)
P2-Net: Joint Description and Detection of Local Features for Pixel and Point Matching [78.18641868402901]
This work takes the initiative to establish fine-grained correspondences between 2D images and 3D point clouds. An ultra-wide reception mechanism in combination with a novel loss function are designed to mitigate the intrinsic information variations between pixel and point local regions.
arXiv Detail & Related papers (2021-03-01T14:59:40Z)
LCD -- Line Clustering and Description for Place Recognition [29.053923938306323]
We introduce a novel learning-based approach to place recognition, using RGB-D cameras and line clusters as visual and geometric features. We present a neural network architecture based on the attention mechanism for frame-wise line clustering. A similar neural network is used for the description of these clusters with a compact embedding of 128 floating point numbers.
arXiv Detail & Related papers (2020-10-21T09:52:47Z)
Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification [71.96618723152487]
We introduce Attention Pyramid Convolutional Neural Network (AP-CNN) AP-CNN learns both high-level semantic and low-level detailed feature representation. It can be trained end-to-end, without the need of additional bounding box/part annotations.
arXiv Detail & Related papers (2020-02-09T12:33:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.