Related papers: Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching

Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching

URL: http://arxiv.org/abs/2312.04060v1
Date: Thu, 7 Dec 2023 05:46:10 GMT
Title: Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching
Authors: Junsheng Zhou, Baorui Ma, Wenyuan Zhang, Yi Fang, Yu-Shen Liu, Zhizhong Han
Abstract summary: Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic training. Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks. We learn a structured cross-modality matching solver to represent 3D features via a different latent pixel space.
Score: 58.10418136917358
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic. Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks, and use Perspective-n-Points (PnP) to estimate rigid transformation during post-processing. However, these methods struggle to map points and pixels to a shared latent space robustly since points and pixels have very different characteristics with patterns learned in different manners (MLP and CNN), and they also fail to construct supervision directly on the transformation since the PnP is non-differentiable, which leads to unstable registration results. To address these problems, we propose to learn a structured cross-modality latent space to represent pixel features and 3D features via a differentiable probabilistic PnP solver. Specifically, we design a triplet network to learn VoxelPoint-to-Pixel matching, where we represent 3D elements using both voxels and points to learn the cross-modality latent space with pixels. We design both the voxel and pixel branch based on CNNs to operate convolutions on voxels/pixels represented in grids, and integrate an additional point branch to regain the information lost during voxelization. We train our framework end-to-end by imposing supervisions directly on the predicted pose distribution with a probabilistic PnP solver. To explore distinctive patterns of cross-modality features, we design a novel loss with adaptive-weighted optimization for cross-modality feature description. The experimental results on KITTI and nuScenes datasets show significant improvements over the state-of-the-art methods. The code and models are available at https://github.com/junshengzhou/VP2P-Match.

Related papers

PointVDP: Learning View-Dependent Projection by Fireworks Rays for 3D Point Cloud Segmentation [66.00721801098574]
We propose view-dependent projection (VDP) to facilitate point cloud segmentation.<n>VDP generates data-driven projections from 3D point distributions.<n>We construct color regularization to optimize the framework.
arXiv Detail & Related papers (2025-07-09T07:44:00Z)
EdgeRegNet: Edge Feature-based Multimodal Registration Network between Images and LiDAR Point Clouds [10.324549723042338]
Cross-modal data registration has long been a critical task in computer vision. We propose a method that uses edge information from the original point clouds and images for cross-modal registration. We validate our method on the KITTI and nuScenes datasets, demonstrating its state-of-the-art performance.
arXiv Detail & Related papers (2025-03-19T15:03:41Z)
Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching [2.400446821380503]
We introduce an efficient framework to learn descriptors for both RGB images and point clouds. It takes visual state space model (VMamba) as the backbone and employs a pixel-view-scene joint training strategy. A visible 3D points overlap strategy is then designed to quantify the similarity between point cloud views and RGB images for multi-view supervision.
arXiv Detail & Related papers (2024-10-08T18:31:41Z)
Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes. We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data. We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z)
Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data [80.14669385741202]
We propose a self-supervised pre-training method for 3D perception models tailored to autonomous driving data. We leverage the availability of synchronized and calibrated image and Lidar sensors in autonomous driving setups. Our method does not require any point cloud nor image annotations.
arXiv Detail & Related papers (2022-03-30T12:40:30Z)
SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations [85.38562724999898]
We propose a 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU. Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module and an inter-modal feature interaction module. To the best of our knowledge, this is the first study to explore contrastive learning pre-training strategies for outdoor multi-modal datasets.
arXiv Detail & Related papers (2021-12-09T03:27:00Z)
P2-Net: Joint Description and Detection of Local Features for Pixel and Point Matching [78.18641868402901]
This work takes the initiative to establish fine-grained correspondences between 2D images and 3D point clouds. An ultra-wide reception mechanism in combination with a novel loss function are designed to mitigate the intrinsic information variations between pixel and point local regions.
arXiv Detail & Related papers (2021-03-01T14:59:40Z)
Probabilistic Vehicle Reconstruction Using a Multi-Task CNN [0.0]
We present a probabilistic approach for shape-aware 3D vehicle reconstruction from stereo images. Specifically, we train a CNN that outputs probability distributions for the vehicle's orientation and for both, vehicle keypoints and wireframe edges. We show that our method achieves state-of-the-art results, evaluating our method on the challenging KITTI benchmark.
arXiv Detail & Related papers (2021-02-21T20:45:44Z)
Learning 2D-3D Correspondences To Solve The Blind Perspective-n-Point Problem [98.92148855291363]
This paper proposes a deep CNN model which simultaneously solves for both 6-DoF absolute camera pose 2D--3D correspondences. Tests on both real and simulated data have shown that our method substantially outperforms existing approaches.
arXiv Detail & Related papers (2020-03-15T04:17:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.