Differentiable Registration of Images and LiDAR Point Clouds with
VoxelPoint-to-Pixel Matching
- URL: http://arxiv.org/abs/2312.04060v1
- Date: Thu, 7 Dec 2023 05:46:10 GMT
- Title: Differentiable Registration of Images and LiDAR Point Clouds with
VoxelPoint-to-Pixel Matching
- Authors: Junsheng Zhou, Baorui Ma, Wenyuan Zhang, Yi Fang, Yu-Shen Liu,
Zhizhong Han
- Abstract summary: Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic training.
Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks.
We learn a structured cross-modality matching solver to represent 3D features via a different latent pixel space.
- Score: 58.10418136917358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-modality registration between 2D images from cameras and 3D point
clouds from LiDARs is a crucial task in computer vision and robotic. Previous
methods estimate 2D-3D correspondences by matching point and pixel patterns
learned by neural networks, and use Perspective-n-Points (PnP) to estimate
rigid transformation during post-processing. However, these methods struggle to
map points and pixels to a shared latent space robustly since points and pixels
have very different characteristics with patterns learned in different manners
(MLP and CNN), and they also fail to construct supervision directly on the
transformation since the PnP is non-differentiable, which leads to unstable
registration results. To address these problems, we propose to learn a
structured cross-modality latent space to represent pixel features and 3D
features via a differentiable probabilistic PnP solver. Specifically, we design
a triplet network to learn VoxelPoint-to-Pixel matching, where we represent 3D
elements using both voxels and points to learn the cross-modality latent space
with pixels. We design both the voxel and pixel branch based on CNNs to operate
convolutions on voxels/pixels represented in grids, and integrate an additional
point branch to regain the information lost during voxelization. We train our
framework end-to-end by imposing supervisions directly on the predicted pose
distribution with a probabilistic PnP solver. To explore distinctive patterns
of cross-modality features, we design a novel loss with adaptive-weighted
optimization for cross-modality feature description. The experimental results
on KITTI and nuScenes datasets show significant improvements over the
state-of-the-art methods. The code and models are available at
https://github.com/junshengzhou/VP2P-Match.
Related papers
- Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching [2.400446821380503]
We introduce an efficient framework to learn descriptors for both RGB images and point clouds.
It takes visual state space model (VMamba) as the backbone and employs a pixel-view-scene joint training strategy.
A visible 3D points overlap strategy is then designed to quantify the similarity between point cloud views and RGB images for multi-view supervision.
arXiv Detail & Related papers (2024-10-08T18:31:41Z) - Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes.
We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data.
We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z) - Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data [80.14669385741202]
We propose a self-supervised pre-training method for 3D perception models tailored to autonomous driving data.
We leverage the availability of synchronized and calibrated image and Lidar sensors in autonomous driving setups.
Our method does not require any point cloud nor image annotations.
arXiv Detail & Related papers (2022-03-30T12:40:30Z) - SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for
Spatial-Aware Visual Representations [85.38562724999898]
We propose a 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU.
Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module and an inter-modal feature interaction module.
To the best of our knowledge, this is the first study to explore contrastive learning pre-training strategies for outdoor multi-modal datasets.
arXiv Detail & Related papers (2021-12-09T03:27:00Z) - P2-Net: Joint Description and Detection of Local Features for Pixel and
Point Matching [78.18641868402901]
This work takes the initiative to establish fine-grained correspondences between 2D images and 3D point clouds.
An ultra-wide reception mechanism in combination with a novel loss function are designed to mitigate the intrinsic information variations between pixel and point local regions.
arXiv Detail & Related papers (2021-03-01T14:59:40Z) - Probabilistic Vehicle Reconstruction Using a Multi-Task CNN [0.0]
We present a probabilistic approach for shape-aware 3D vehicle reconstruction from stereo images.
Specifically, we train a CNN that outputs probability distributions for the vehicle's orientation and for both, vehicle keypoints and wireframe edges.
We show that our method achieves state-of-the-art results, evaluating our method on the challenging KITTI benchmark.
arXiv Detail & Related papers (2021-02-21T20:45:44Z) - Learning 2D-3D Correspondences To Solve The Blind Perspective-n-Point
Problem [98.92148855291363]
This paper proposes a deep CNN model which simultaneously solves for both 6-DoF absolute camera pose 2D--3D correspondences.
Tests on both real and simulated data have shown that our method substantially outperforms existing approaches.
arXiv Detail & Related papers (2020-03-15T04:17:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.