HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
- URL: http://arxiv.org/abs/2503.08140v2
- Date: Fri, 21 Mar 2025 07:00:11 GMT
- Title: HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
- Authors: Ethan Griffiths, Maryam Haghighat, Simon Denman, Clinton Fookes, Milad Ramezani,
- Abstract summary: We present HOTFormerLoc, a novel and versatile Hierarchical Octree-based TransFormer, for large-scale 3D place recognition.<n>We propose an octree-based multi-scale attention mechanism that captures spatial and semantic features across granularities.<n>We introduce CS-Wild-Places, a novel 3D cross-source dataset featuring point cloud data from aerial and ground lidar scans captured in dense forests.
- Score: 30.77381516091565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present HOTFormerLoc, a novel and versatile Hierarchical Octree-based TransFormer, for large-scale 3D place recognition in both ground-to-ground and ground-to-aerial scenarios across urban and forest environments. We propose an octree-based multi-scale attention mechanism that captures spatial and semantic features across granularities. To address the variable density of point distributions from spinning lidar, we present cylindrical octree attention windows to reflect the underlying distribution during attention. We introduce relay tokens to enable efficient global-local interactions and multi-scale representation learning at reduced computational cost. Our pyramid attentional pooling then synthesises a robust global descriptor for end-to-end place recognition in challenging environments. In addition, we introduce CS-Wild-Places, a novel 3D cross-source dataset featuring point cloud data from aerial and ground lidar scans captured in dense forests. Point clouds in CS-Wild-Places contain representational gaps and distinctive attributes such as varying point densities and noise patterns, making it a challenging benchmark for cross-view localisation in the wild. HOTFormerLoc achieves a top-1 average recall improvement of 5.5% - 11.5% on the CS-Wild-Places benchmark. Furthermore, it consistently outperforms SOTA 3D place recognition methods, with an average performance gain of 4.9% on well-established urban and forest datasets. The code and CS-Wild-Places benchmark is available at https://csiro-robotics.github.io/HOTFormerLoc.
Related papers
- ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images [38.727720300337296]
We propose a robust LiDAR-based place recognition method for natural forests, ForestLPR.<n>Cross-sectional images of the forest's geometry at different heights contain the information needed to recognize revisiting a place.<n>Our approach utilizes a visual transformer as the shared backbone to produce sets of local descriptors.
arXiv Detail & Related papers (2025-03-06T14:24:22Z) - GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving [9.023864430027333]
We propose a 3D Gaussian Splatting-based multimodal place recognition network dubbed GPSR.<n>It explicitly combines multi-view RGB images and LiDAR point clouds into atemporally unified scene representation with the Multimodal Gaussian Splatting.<n>Our method can effectively leverage complementary strengths of both multi-view cameras and LiDAR, achieving SOTA place recognition performance while maintaining solid generalization ability.
arXiv Detail & Related papers (2024-10-01T00:43:45Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - WildScenes: A Benchmark for 2D and 3D Semantic Segmentation in Large-scale Natural Environments [33.25040383298019]
$WildScenes$ is a bi-modal benchmark dataset consisting of high-resolution 2D images and dense 3D LiDAR point clouds.
The data is trajectory-centric with accurate localization and globally aligned point clouds.
Our 3D semantic labels are obtained via an efficient, automated process that transfers the human-annotated 2D labels from multiple views into 3D point cloud sequences.
arXiv Detail & Related papers (2023-12-23T22:27:40Z) - CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition [45.16530801796705]
CrossLoc3D is a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting.
We present CS-Campus3D, the first 3D aerial-ground cross-source dataset consisting of point cloud data from both aerial and ground LiDAR scans.
arXiv Detail & Related papers (2023-03-31T02:50:52Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - SphereVLAD++: Attention-based and Signal-enhanced Viewpoint Invariant
Descriptor [6.326554177747699]
We develop SphereVLAD++, an attention-enhanced viewpoint invariant place recognition method.
We show that SphereVLAD++ outperforms all relative state-of-the-art 3D place recognition methods under small or even totally reversed viewpoint differences.
arXiv Detail & Related papers (2022-07-06T20:32:43Z) - Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information.
Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z) - Learning Semantic Segmentation of Large-Scale Point Clouds with Random
Sampling [52.464516118826765]
We introduce RandLA-Net, an efficient and lightweight neural architecture to infer per-point semantics for large-scale point clouds.
The key to our approach is to use random point sampling instead of more complex point selection approaches.
Our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches.
arXiv Detail & Related papers (2021-07-06T05:08:34Z) - DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF
Relocalization [56.15308829924527]
We propose a Siamese network that jointly learns 3D local feature detection and description directly from raw 3D points.
For detecting 3D keypoints we predict the discriminativeness of the local descriptors in an unsupervised manner.
Experiments on various benchmarks demonstrate that our method achieves competitive results for both global point cloud retrieval and local point cloud registration.
arXiv Detail & Related papers (2020-07-17T20:21:22Z) - A Nearest Neighbor Network to Extract Digital Terrain Models from 3D
Point Clouds [1.6249267147413524]
We present an algorithm that operates on 3D-point clouds and estimates the underlying DTM for the scene using an end-to-end approach.
Our model learns neighborhood information and seamlessly integrates this with point-wise and block-wise global features.
arXiv Detail & Related papers (2020-05-21T15:54:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.