Single-Frame Point-Pixel Registration via Supervised Cross-Modal Feature Matching
- URL: http://arxiv.org/abs/2506.22784v1
- Date: Sat, 28 Jun 2025 06:57:13 GMT
- Title: Single-Frame Point-Pixel Registration via Supervised Cross-Modal Feature Matching
- Authors: Yu Han, Zhiwei Huang, Yanting Zhang, Fangjun Ding, Shen Cai, Rui Fan,
- Abstract summary: We introduce a detector-free framework for direct point-pixel matching between LiDAR and camera views.<n>Specifically, we project the LiDAR intensity map into a 2D view from the LiDAR perspective and feed it into an attention-based matching network.<n>To further enhance matching reliability, we introduce a repeatability scoring mechanism that acts as a soft visibility prior.
- Score: 7.5461100059974315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point-pixel registration between LiDAR point clouds and camera images is a fundamental yet challenging task in autonomous driving and robotic perception. A key difficulty lies in the modality gap between unstructured point clouds and structured images, especially under sparse single-frame LiDAR settings. Existing methods typically extract features separately from point clouds and images, then rely on hand-crafted or learned matching strategies. This separate encoding fails to bridge the modality gap effectively, and more critically, these methods struggle with the sparsity and noise of single-frame LiDAR, often requiring point cloud accumulation or additional priors to improve reliability. Inspired by recent progress in detector-free matching paradigms (e.g. MatchAnything), we revisit the projection-based approach and introduce the detector-free framework for direct point-pixel matching between LiDAR and camera views. Specifically, we project the LiDAR intensity map into a 2D view from the LiDAR perspective and feed it into an attention-based detector-free matching network, enabling cross-modal correspondence estimation without relying on multi-frame accumulation. To further enhance matching reliability, we introduce a repeatability scoring mechanism that acts as a soft visibility prior. This guides the network to suppress unreliable matches in regions with low intensity variation, improving robustness under sparse input. Extensive experiments on KITTI, nuScenes, and MIAS-LCEC-TF70 benchmarks demonstrate that our method achieves state-of-the-art performance, outperforming prior approaches on nuScenes (even those relying on accumulated point clouds), despite using only single-frame LiDAR.
Related papers
- Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras [56.39904484784127]
We propose an approach for estimating the relative pose between rolling shutter cameras using the intersections of line projections with a single scanline per image.<n>Alternatively, scanlines can be selected within a single image, enabling single-view relative pose estimation for scanlines of rolling shutter cameras.
arXiv Detail & Related papers (2025-06-27T10:00:21Z) - AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection [58.67129770371016]
We propose a novel IRSTD framework that reimagines the IRSTD paradigm by incorporating textual metadata for scene-aware optimization.<n>AuxDet consistently outperforms state-of-the-art methods, validating the critical role of auxiliary information in improving robustness and accuracy.
arXiv Detail & Related papers (2025-05-21T07:02:05Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - PAPI-Reg: Patch-to-Pixel Solution for Efficient Cross-Modal Registration between LiDAR Point Cloud and Camera Image [10.906218491083576]
Cross-modal data fusion involves the precise alignment of data from different sensors.<n>We propose a framework that projects point clouds into several 2D representations for matching with camera images.<n>To tackle the challenges of cross modal differences and the limited overlap between LiDAR point clouds and images in the image matching task, we introduce a multi-scale feature extraction network.
arXiv Detail & Related papers (2025-03-19T15:04:01Z) - EdgeRegNet: Edge Feature-based Multimodal Registration Network between Images and LiDAR Point Clouds [10.324549723042338]
Cross-modal data registration has long been a critical task in computer vision.<n>We propose a method that uses edge information from the original point clouds and images for cross-modal registration.<n>We validate our method on the KITTI and nuScenes datasets, demonstrating its state-of-the-art performance.
arXiv Detail & Related papers (2025-03-19T15:03:41Z) - LPRnet: A self-supervised registration network for LiDAR and photogrammetric point clouds [38.42527849407057]
LiDAR and photogrammetry are active and passive remote sensing techniques for point cloud acquisition, respectively.<n>Due to the fundamental differences in sensing mechanisms, spatial distributions and coordinate systems, their point clouds exhibit significant discrepancies in density, precision, noise, and overlap.<n>This paper proposes a self-supervised registration network based on a masked autoencoder, focusing on heterogeneous LiDAR and photogrammetric point clouds.
arXiv Detail & Related papers (2025-01-10T02:36:37Z) - A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration [9.609585217048664]
We develop a consistency-aware spot-guided Transformer (CAST)
CAST incorporates a spot-guided cross-attention module to avoid interfering with irrelevant areas.
A lightweight fine matching module for both sparse keypoints and dense features can estimate the transformation accurately.
arXiv Detail & Related papers (2024-10-14T08:48:25Z) - From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera
Fusion [12.792769704561024]
Existing fusion methods tend to align each 3D point to only one projected image pixel based on calibration.
We propose a Dynamic Cross Attention (DCA) module with a novel one-to-many cross-modality mapping.
The whole fusion architecture named Dynamic Cross Attention Network (DCAN) exploits multi-level image features and adapts to multiple representations of point clouds.
arXiv Detail & Related papers (2022-09-25T16:10:14Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic
Segmentation [78.74202673902303]
We propose a coarse-tofine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation.
The proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy.
The cooperation of these two components leads to the success of the effective camera-LiDAR fusion.
arXiv Detail & Related papers (2021-08-17T08:53:11Z) - Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning.
We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.