Related papers: Diff$^2$I2P: Differentiable Image-to-Point Cloud Registration with Diffusion Prior

Diff$^2$I2P: Differentiable Image-to-Point Cloud Registration with Diffusion Prior

URL: http://arxiv.org/abs/2507.06651v1
Date: Wed, 09 Jul 2025 08:30:49 GMT
Title: Diff$^2$I2P: Differentiable Image-to-Point Cloud Registration with Diffusion Prior
Authors: Juncheng Mu, Chengwei Ren, Weixiang Zhang, Liang Pan, Xiao-Ping Zhang, Yue Gao,
Abstract summary: Cross-modal correspondences are essential for image-to-point cloud (I2P) registration.<n>We propose Diff$2$I2P, a fully Differentiable I2P registration framework, leveraging a novel and effective prior for bridging the modality gap.
Score: 21.693977784321202
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning cross-modal correspondences is essential for image-to-point cloud (I2P) registration. Existing methods achieve this mostly by utilizing metric learning to enforce feature alignment across modalities, disregarding the inherent modality gap between image and point data. Consequently, this paradigm struggles to ensure accurate cross-modal correspondences. To this end, inspired by the cross-modal generation success of recent large diffusion models, we propose Diff$^2$I2P, a fully Differentiable I2P registration framework, leveraging a novel and effective Diffusion prior for bridging the modality gap. Specifically, we propose a Control-Side Score Distillation (CSD) technique to distill knowledge from a depth-conditioned diffusion model to directly optimize the predicted transformation. However, the gradients on the transformation fail to backpropagate onto the cross-modal features due to the non-differentiability of correspondence retrieval and PnP solver. To this end, we further propose a Deformable Correspondence Tuning (DCT) module to estimate the correspondences in a differentiable way, followed by the transformation estimation using a differentiable PnP solver. With these two designs, the Diffusion model serves as a strong prior to guide the cross-modal feature learning of image and point cloud for forming robust correspondences, which significantly improves the registration. Extensive experimental results demonstrate that Diff$^2$I2P consistently outperforms SoTA I2P registration methods, achieving over 7% improvement in registration recall on the 7-Scenes benchmark.

Related papers

Improving Progressive Generation with Decomposable Flow Matching [50.63174319509629]
Decomposable Flow Matching (DFM) is a simple and effective framework for the progressive generation of visual media.<n>On Imagenet-1k 512px, DFM achieves 35.2% improvements in FDD scores over the base architecture and 26.4% over the best-performing baseline.
arXiv Detail & Related papers (2025-06-24T17:58:02Z)
Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs) Existing binarization methods result in significant performance degradation. We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z)
Provably Robust Score-Based Diffusion Posterior Sampling for Plug-and-Play Image Reconstruction [31.503662384666274]
In science and engineering, the goal is to infer an unknown image from a small number of measurements collected from a known forward model describing certain imaging modality. Motivated Score-based diffusion models, due to its empirical success, have emerged as an impressive candidate of an exemplary prior in image reconstruction.
arXiv Detail & Related papers (2024-03-25T15:58:26Z)
Adaptive Multi-step Refinement Network for Robust Point Cloud Registration [82.64560249066734]
Point Cloud Registration estimates the relative rigid transformation between two point clouds of the same scene.<n>We propose an adaptive multi-step refinement network that refines the registration quality at each step by leveraging the information from the preceding step.<n>Our method achieves state-of-the-art performance on both the 3DMatch/3DLoMatch and KITTI benchmarks.
arXiv Detail & Related papers (2023-12-05T18:59:41Z)
Exploring Straighter Trajectories of Flow Matching with Diffusion Guidance [66.4153984834872]
We propose Straighter trajectories of Flow Matching (StraightFM) It straightens trajectories with the coupling strategy guided by diffusion model from entire distribution level. It generates visually appealing images with a lower FID among diffusion and traditional flow matching methods.
arXiv Detail & Related papers (2023-11-28T06:19:30Z)
SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation [66.16525145765604]
We introduce an SE(3) diffusion model-based point cloud registration framework for 6D object pose estimation in real-world scenarios. Our approach formulates the 3D registration task as a denoising diffusion process, which progressively refines the pose of the source point cloud. Experiments demonstrate that our diffusion registration framework presents outstanding pose estimation performance on the real-world TUD-L, LINEMOD, and Occluded-LINEMOD datasets.
arXiv Detail & Related papers (2023-10-26T12:47:26Z)
FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators [37.39693977657165]
Matching cross-modality features between images and point clouds is a fundamental problem for image-to-point cloud registration. We propose to unify the modality between images and point clouds by pretrained large-scale models first. We show that the intermediate features, called diffusion features, extracted by depth-to-image diffusion models are semantically consistent between images and point clouds.
arXiv Detail & Related papers (2023-10-05T09:57:23Z)
Improving Misaligned Multi-modality Image Fusion with One-stage Progressive Dense Registration [67.23451452670282]
Misalignments between multi-modality images pose challenges in image fusion. We propose a Cross-modality Multi-scale Progressive Dense Registration scheme. This scheme accomplishes the coarse-to-fine registration exclusively using a one-stage optimization.
arXiv Detail & Related papers (2023-08-22T03:46:24Z)
Fourier Test-time Adaptation with Multi-level Consistency for Robust Classification [10.291631977766672]
We propose a novel approach called Fourier Test-time Adaptation (FTTA) to integrate input and model tuning. FTTA builds a reliable multi-level consistency measurement of paired inputs for achieving self-supervised of prediction. It was extensively validated on three large classification datasets with different modalities and organs.
arXiv Detail & Related papers (2023-06-05T02:29:38Z)
Diffusion Model for Dense Matching [34.13580888014]
The objective for establishing dense correspondence between paired images consists of two terms: a data term and a prior term. We propose DiffMatch, a novel conditional diffusion-based framework designed to explicitly model both the data and prior terms. Our experimental results demonstrate significant performance improvements of our method over existing approaches.
arXiv Detail & Related papers (2023-05-30T14:58:24Z)
Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation. We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.