CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration
- URL: http://arxiv.org/abs/2309.14660v5
- Date: Thu, 12 Sep 2024 09:43:01 GMT
- Title: CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration
- Authors: Shuhao Kang, Youqi Liao, Jianping Li, Fuxun Liang, Yuhao Li, Xianghong Zou, Fangning Li, Xieyuanli Chen, Zhen Dong, Bisheng Yang,
- Abstract summary: CoFiI2P is a novel I2P registration network that extracts correspondences in a coarse-to-fine manner.
In the coarse matching phase, a novel I2P transformer module is employed to capture both homogeneous and heterogeneous global information.
In the fine matching module, point/pixel pairs are established with the guidance of super-point/super-pixel correspondences.
- Score: 9.57539651520755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-to-point cloud (I2P) registration is a fundamental task for robots and autonomous vehicles to achieve cross-modality data fusion and localization. Current I2P registration methods primarily focus on estimating correspondences at the point or pixel level, often neglecting global alignment. As a result, I2P matching can easily converge to a local optimum if it lacks high-level guidance from global constraints. To improve the success rate and general robustness, this paper introduces CoFiI2P, a novel I2P registration network that extracts correspondences in a coarse-to-fine manner. First, the image and point cloud data are processed through a two-stream encoder-decoder network for hierarchical feature extraction. Second, a coarse-to-fine matching module is designed to leverage these features and establish robust feature correspondences. Specifically, In the coarse matching phase, a novel I2P transformer module is employed to capture both homogeneous and heterogeneous global information from the image and point cloud data. This enables the estimation of coarse super-point/super-pixel matching pairs with discriminative descriptors. In the fine matching module, point/pixel pairs are established with the guidance of super-point/super-pixel correspondences. Finally, based on matching pairs, the transform matrix is estimated with the EPnP-RANSAC algorithm. Experiments conducted on the KITTI Odometry dataset demonstrate that CoFiI2P achieves impressive results, with a relative rotation error (RRE) of 1.14 degrees and a relative translation error (RTE) of 0.29 meters, while maintaining real-time speed.Additional experiments on the Nuscenes datasets confirm our method's generalizability. The project page is available at \url{https://whu-usi3dv.github.io/CoFiI2P}.
Related papers
- Multiway Point Cloud Mosaicking with Diffusion and Global Optimization [74.3802812773891]
We introduce a novel framework for multiway point cloud mosaicking (named Wednesday)
At the core of our approach is ODIN, a learned pairwise registration algorithm that identifies overlaps and refines attention scores.
Tested on four diverse, large-scale datasets, our method state-of-the-art pairwise and rotation registration results by a large margin on all benchmarks.
arXiv Detail & Related papers (2024-03-30T17:29:13Z) - Quantity-Aware Coarse-to-Fine Correspondence for Image-to-Point Cloud
Registration [4.954184310509112]
Image-to-point cloud registration aims to determine the relative camera pose between an RGB image and a reference point cloud.
Matching individual points with pixels can be inherently ambiguous due to modality gaps.
We propose a framework to capture quantity-aware correspondences between local point sets and pixel patches.
arXiv Detail & Related papers (2023-07-14T03:55:54Z) - I2P-Rec: Recognizing Images on Large-scale Point Cloud Maps through
Bird's Eye View Projections [18.7557037030769]
Place recognition is an important technique for autonomous cars to achieve full autonomy.
We propose the I2P-Rec method to solve the problem by transforming the cross-modal data into the same modality.
With only a small set of training data, I2P-Rec achieves recall rates at Top-1% over 80% and 90%, when localizing monocular and stereo images on point cloud maps.
arXiv Detail & Related papers (2023-03-02T07:56:04Z) - Efficient Context Integration through Factorized Pyramidal Learning for
Ultra-Lightweight Semantic Segmentation [1.0499611180329804]
We propose a novel Factorized Pyramidal Learning (FPL) module to aggregate rich contextual information in an efficient manner.
We decompose the spatial pyramid into two stages which enables a simple and efficient feature fusion within the module to solve the notorious checkerboard effect.
Based on the FPL module and FIR unit, we propose an ultra-lightweight real-time network, called FPLNet, which achieves state-of-the-art accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-02-23T05:34:51Z) - Transformer-based Context Condensation for Boosting Feature Pyramids in
Object Detection [77.50110439560152]
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF)
We propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results.
In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency.
arXiv Detail & Related papers (2022-07-14T01:45:03Z) - Robust Partial-to-Partial Point Cloud Registration in a Full Range [12.86951061306046]
We propose Graph Matching Consensus Network (GMCNet), which estimates pose-invariant correspondences for fullrange 1 Partial-to-Partial point cloud Registration (PPR)
GMCNet encodes point descriptors for each point cloud individually without using crosscontextual information, or ground truth correspondences for training.
arXiv Detail & Related papers (2021-11-30T17:56:24Z) - PCAM: Product of Cross-Attention Matrices for Rigid Registration of
Point Clouds [79.99653758293277]
PCAM is a neural network whose key element is a pointwise product of cross-attention matrices.
We show that PCAM achieves state-of-the-art results among methods which, like us, solve steps (a) and (b) jointly via deepnets.
arXiv Detail & Related papers (2021-10-04T09:23:27Z) - DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud.
Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar.
We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - RANSAC-Flow: generic two-stage image alignment [53.11926395028508]
We show that a simple unsupervised approach performs surprisingly well across a range of tasks.
Despite its simplicity, our method shows competitive results on a range of tasks and datasets.
arXiv Detail & Related papers (2020-04-03T12:37:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.