Cross-modal feature fusion for robust point cloud registration with ambiguous geometry
- URL: http://arxiv.org/abs/2505.13088v1
- Date: Mon, 19 May 2025 13:22:46 GMT
- Title: Cross-modal feature fusion for robust point cloud registration with ambiguous geometry
- Authors: Zhaoyi Wang, Shengyu Huang, Jemil Avers Butt, Yuanzhou Cai, Matej Varga, Andreas Wieser,
- Abstract summary: We propose a novel Cross-modal Feature Fusion method for point cloud registration.<n>It incorporates a two-stage fusion of 3D point cloud features and 2D image features.<n>It achieves state-of-the-art registration performance across all benchmarks.
- Score: 6.742883954812066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point cloud registration has seen significant advancements with the application of deep learning techniques. However, existing approaches often overlook the potential of integrating radiometric information from RGB images. This limitation reduces their effectiveness in aligning point clouds pairs, especially in regions where geometric data alone is insufficient. When used effectively, radiometric information can enhance the registration process by providing context that is missing from purely geometric data. In this paper, we propose CoFF, a novel Cross-modal Feature Fusion method that utilizes both point cloud geometry and RGB images for pairwise point cloud registration. Assuming that the co-registration between point clouds and RGB images is available, CoFF explicitly addresses the challenges where geometric information alone is unclear, such as in regions with symmetric similarity or planar structures, through a two-stage fusion of 3D point cloud features and 2D image features. It incorporates a cross-modal feature fusion module that assigns pixel-wise image features to 3D input point clouds to enhance learned 3D point features, and integrates patch-wise image features with superpoint features to improve the quality of coarse matching. This is followed by a coarse-to-fine matching module that accurately establishes correspondences using the fused features. We extensively evaluate CoFF on four common datasets: 3DMatch, 3DLoMatch, IndoorLRS, and the recently released ScanNet++ datasets. In addition, we assess CoFF on specific subset datasets containing geometrically ambiguous cases. Our experimental results demonstrate that CoFF achieves state-of-the-art registration performance across all benchmarks, including remarkable registration recalls of 95.9% and 81.6% on the widely-used 3DMatch and 3DLoMatch datasets, respectively...(Truncated to fit arXiv abstract length)
Related papers
- Fully-Geometric Cross-Attention for Point Cloud Registration [51.865371511201765]
Point cloud registration approaches often fail when the overlap between point clouds is low due to noisy point correspondences.<n>This work introduces a novel cross-attention mechanism tailored for Transformer-based architectures that tackles this problem.<n>We integrate the Gromov-Wasserstein distance into the cross-attention formulation to jointly compute distances between points across different point clouds.<n>At the point level, we also devise a self-attention mechanism that aggregates the local geometric structure information into point features for fine matching.
arXiv Detail & Related papers (2025-02-12T10:44:36Z) - PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging [0.0]
Point clouds are a very efficient way to represent volumetric data in medical imaging.<n>Despite their benefits, point clouds are still underexplored in medical imaging compared to volumetric 3D CNNs and vision transformers.<n>This work presents a hybrid approach that combines point-wise operations with intermediate differentiableisation and dense localised CNNs.
arXiv Detail & Related papers (2024-12-23T08:43:39Z) - ZeroReg: Zero-Shot Point Cloud Registration with Foundation Models [77.84408427496025]
State-of-the-art 3D point cloud registration methods rely on labeled 3D datasets for training.<n>We introduce ZeroReg, a zero-shot registration approach that utilizes 2D foundation models to predict 3D correspondences.
arXiv Detail & Related papers (2023-12-05T11:33:16Z) - GQE-Net: A Graph-based Quality Enhancement Network for Point Cloud Color
Attribute [51.4803148196217]
We propose a graph-based quality enhancement network (GQE-Net) to reduce color distortion in point clouds.
GQE-Net uses geometry information as an auxiliary input and graph convolution blocks to extract local features efficiently.
Experimental results show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-03-24T02:33:45Z) - PCR-CG: Point Cloud Registration via Deep Explicit Color and Geometry [28.653015760036602]
We introduce a novel 3D point cloud registration module explicitly embedding the color signals into the geometry representation.
Our key contribution is a 2D-3D cross-modality learning algorithm that embeds the deep features learned from color signals to the geometry representation.
Our study reveals a significant advantages of correlating explicit deep color features to the point cloud in the registration task.
arXiv Detail & Related papers (2023-02-28T08:50:17Z) - Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object
Detection [16.198358858773258]
Multi-modal 3D object detection has been an active research topic in autonomous driving.
It is non-trivial to explore the cross-modal feature fusion between sparse 3D points and dense 2D pixels.
Recent approaches either fuse the image features with the point cloud features that are projected onto the 2D image plane or combine the sparse point cloud with dense image pixels.
arXiv Detail & Related papers (2022-10-18T06:15:56Z) - FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D
Object Detection [19.419030878019974]
unstructured 3D point clouds are filled in the 2D plane and 3D point cloud features are extracted faster using projection-aware convolution layers.
The corresponding indexes between different sensor signals are established in advance in the data preprocessing.
Two new plug-and-play fusion modules, LiCamFuse and BiLiCamFuse, are proposed.
arXiv Detail & Related papers (2022-09-15T16:13:19Z) - Improving RGB-D Point Cloud Registration by Learning Multi-scale Local
Linear Transformation [38.64501645574878]
Point cloud registration aims at estimating the geometric transformation between two point cloud scans.
Recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence.
We propose a new Geometry-Aware Visual Feature Extractor (GAVE) that employs multi-scale local linear transformation.
arXiv Detail & Related papers (2022-08-31T14:36:09Z) - CorrI2P: Deep Image-to-Point Cloud Registration via Dense Correspondence [51.91791056908387]
We propose the first feature-based dense correspondence framework for addressing the image-to-point cloud registration problem, dubbed CorrI2P.
Specifically, given a pair of a 2D image before a 3D point cloud, we first transform them into high-dimensional feature space feed the features into a symmetric overlapping region to determine the region where the image point cloud overlap.
arXiv Detail & Related papers (2022-07-12T11:49:31Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - Similarity-Aware Fusion Network for 3D Semantic Segmentation [87.51314162700315]
We propose a similarity-aware fusion network (SAFNet) to adaptively fuse 2D images and 3D point clouds for 3D semantic segmentation.
We employ a late fusion strategy where we first learn the geometric and contextual similarities between the input and back-projected (from 2D pixels) point clouds.
We show that SAFNet significantly outperforms existing state-of-the-art fusion-based approaches across various data integrity.
arXiv Detail & Related papers (2021-07-04T09:28:18Z) - Volumetric Propagation Network: Stereo-LiDAR Fusion for Long-Range Depth
Estimation [81.08111209632501]
We propose a geometry-aware stereo-LiDAR fusion network for long-range depth estimation.
We exploit sparse and accurate point clouds as a cue for guiding correspondences of stereo images in a unified 3D volume space.
Our network achieves state-of-the-art performance on the KITTI and the Virtual- KITTI datasets.
arXiv Detail & Related papers (2021-03-24T03:24:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.