Cross-Modal Geometric Hierarchy Fusion: An Implicit-Submap Driven Framework for Resilient 3D Place Recognition
- URL: http://arxiv.org/abs/2506.14243v3
- Date: Wed, 27 Aug 2025 16:09:33 GMT
- Title: Cross-Modal Geometric Hierarchy Fusion: An Implicit-Submap Driven Framework for Resilient 3D Place Recognition
- Authors: Xiaohui Jiang, Haijiang Zhu, Chade Li, Fulin Tang, Ning An,
- Abstract summary: We propose a novel framework that redefines 3D place recognition through density-agnostic geometric reasoning.<n>Specifically, we introduce an implicit 3D representation based on elastic points, which is immune to the interference of original scene point cloud density.<n>With the aid of these two types of information, we obtain descriptors that fuse geometric information from both bird's-eye view and 3D segment perspectives.
- Score: 9.411542547451193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LiDAR-based place recognition serves as a crucial enabler for long-term autonomy in robotics and autonomous driving systems. Yet, prevailing methodologies relying on handcrafted feature extraction face dual challenges: (1) Inconsistent point cloud density, induced by ego-motion dynamics and environmental disturbances during repeated traversals, leads to descriptor instability, and (2) Representation fragility stems from reliance on single-level geometric abstractions that lack discriminative power in structurally complex scenarios. To address these limitations, we propose a novel framework that redefines 3D place recognition through density-agnostic geometric reasoning. Specifically, we introduce an implicit 3D representation based on elastic points, which is immune to the interference of original scene point cloud density and achieves the characteristic of uniform distribution. Subsequently, we derive the occupancy grid and normal vector information of the scene from this implicit representation. Finally, with the aid of these two types of information, we obtain descriptors that fuse geometric information from both bird's-eye view (capturing macro-level spatial layouts) and 3D segment (encoding micro-scale surface geometries) perspectives. We conducted extensive experiments on numerous datasets (KITTI, KITTI-360, MulRan, NCLT) across diverse environments. The experimental results demonstrate that our method achieves state-of-the-art performance. Moreover, our approach strikes an optimal balance between accuracy, runtime, and memory optimization for historical maps, showcasing excellent Resilient and scalability. Our code will be open-sourced in the future.
Related papers
- SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs [21.891285551179365]
We introduce Spherical Coordinate-based Positional Embedding (SoPE)<n>Our method maps point-cloud token indices into a 3D spherical coordinate space, enabling unified modeling of spatial locations and directional angles.<n>This formulation preserves the inherent geometric structure of point-cloud data, enhances spatial awareness, and yields more consistent and expressive geometric representations for multimodal learning.
arXiv Detail & Related papers (2026-02-26T07:42:15Z) - GS-CLIP: Zero-shot 3D Anomaly Detection by Geometry-Aware Prompt and Synergistic View Representation Learning [11.364765496753074]
3D Anomaly Detection is an emerging task that aims to detect anomalies in a target dataset without any target training data.<n>Current methods adapt CLIP by projecting 3D point clouds into 2D representations, but they face challenges.<n>We propose the Geometry-Aware Prompt and Synergistic View Representation Learning framework, which enables the model to identify geometric anomalies through a two-stage learning process.
arXiv Detail & Related papers (2026-02-22T14:30:41Z) - Robust Mesh Saliency GT Acquisition in VR via View Cone Sampling and Geometric Smoothing [59.12032628787018]
3D mesh saliency ground truth is essential for human-centric visual modeling in virtual reality (VR)<n>Current VR eye-tracking pipelines rely on single ray sampling and Euclidean smoothing, triggering texture attention and signal leakage across gaps.<n>This paper proposes a robust framework to address these limitations.
arXiv Detail & Related papers (2026-01-06T05:20:12Z) - 3dSAGER: Geospatial Entity Resolution over 3D Objects (Technical Report) [7.378893412842889]
3dSAGER is an end-to-end pipeline for geospatial entity resolution over 3D objects.<n>We present a novel, spatial-reference-independent featurization mechanism that captures intricate geometric characteristics of matching pairs.<n>We also propose a new lightweight and interpretable blocking method, BKAFI, that leverages a trained model to efficiently generate high-recall candidate sets.
arXiv Detail & Related papers (2025-11-09T09:35:45Z) - A Novel Convolution and Attention Mechanism-based Model for 6D Object Pose Estimation [49.1574468325115]
Esting 6D object poses from RGB images is challenging because the lack of depth information requires inferring a three dimensional structure from 2D projections.<n>Traditional methods often rely on deep learning with grid based data structures but struggle to capture complex dependencies among extracted features.<n>We introduce a graph based representation derived directly from images, where temporal features of each pixel serve as nodes, and relationships between them are defined through node connectivity and spatial interactions.
arXiv Detail & Related papers (2024-12-31T18:47:54Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image [94.56927147492738]
We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes from single images.
We show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage.
We propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions.
arXiv Detail & Related papers (2024-03-18T17:50:41Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Normal Transformer: Extracting Surface Geometry from LiDAR Points Enhanced by Visual Semantics [7.507853813361308]
We introduce a multi-modal technique that leverages 3D point clouds and 2D colour images obtained from LiDAR and camera sensors for surface normal estimation.<n>We present a novel transformer-based neural network architecture that proficiently fuses visual semantic and 3D geometric information.<n>It has been verified that the proposed model can learn from a simulated 3D environment that mimics a traffic scene.
arXiv Detail & Related papers (2022-11-19T03:55:09Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Self-supervised Depth Estimation Leveraging Global Perception and
Geometric Smoothness Using On-board Videos [0.5276232626689566]
We present DLNet for pixel-wise depth estimation, which simultaneously extracts global and local features.
A three-dimensional geometry smoothness loss is proposed to predict a geometrically natural depth map.
In experiments on the KITTI and Make3D benchmarks, the proposed DLNet achieves performance competitive to those of the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-07T10:53:27Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR
Segmentation [81.02742110604161]
State-of-the-art methods for large-scale driving-scene LiDAR segmentation often project the point clouds to 2D space and then process them via 2D convolution.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pat-tern.
Our method achieves the 1st place in the leaderboard of Semantic KITTI and outperforms existing methods on nuScenes with a noticeable margin, about 4%.
arXiv Detail & Related papers (2020-11-19T18:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.