Diff-Reg v2: Diffusion-Based Matching Matrix Estimation for Image Matching and 3D Registration
- URL: http://arxiv.org/abs/2503.04127v2
- Date: Fri, 07 Mar 2025 02:39:57 GMT
- Title: Diff-Reg v2: Diffusion-Based Matching Matrix Estimation for Image Matching and 3D Registration
- Authors: Qianliang Wu, Haobo Jiang, Yaqing Ding, Lei Luo, Jin Xie, Jian Yang,
- Abstract summary: We introduce an innovative paradigm that leverages a diffusion model in matrix space for robust matching matrix estimation.<n>Specifically, we apply the diffusion model in the doubly matrix space for 3D-3D and 2D-3D registration tasks.<n>For all three registration tasks, we provide adaptive matching matrix embedding implementations tailored to the specific characteristics of each task.
- Score: 33.8118117906136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Establishing reliable correspondences is crucial for all registration tasks, including 2D image registration, 3D point cloud registration, and 2D-3D image-to-point cloud registration. However, these tasks are often complicated by challenges such as scale inconsistencies, symmetry, and large deformations, which can lead to ambiguous matches. Previous feature-based and correspondence-based methods typically rely on geometric or semantic features to generate or polish initial potential correspondences. Some methods typically leverage specific geometric priors, such as topological preservation, to devise diverse and innovative strategies tailored to a given enhancement goal, which cannot be exhaustively enumerated. Additionally, many previous approaches rely on a single-step prediction head, which can struggle with local minima in complex matching scenarios. To address these challenges, we introduce an innovative paradigm that leverages a diffusion model in matrix space for robust matching matrix estimation. Our model treats correspondence estimation as a denoising diffusion process in the matching matrix space, gradually refining the intermediate matching matrix to the optimal one. Specifically, we apply the diffusion model in the doubly stochastic matrix space for 3D-3D and 2D-3D registration tasks. In the 2D image registration task, we deploy the diffusion model in a matrix subspace where dual-softmax projection regularization is applied. For all three registration tasks, we provide adaptive matching matrix embedding implementations tailored to the specific characteristics of each task while maintaining a consistent "match-to-warp" encoding pattern. Furthermore, we adopt a lightweight design for the denoising module. In inference, once points or image features are extracted and fixed, this module performs multi-step denoising predictions through reverse sampling.
Related papers
- NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs [9.978766637766373]
We introduce a method to convert point clouds into 1D sequences that maintain 3D spatial structure with no need for data replication.
Our method does not require positional embeddings and allows for shorter sequence lengths while still achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-31T18:58:40Z) - Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration [2.814748676983944]
We propose a graph neural network model embedded with a local Spherical Euclidean 3D equivariance property through SE(3) message passing based propagation.
Our model is composed mainly of a descriptor module, equivariant graph layers, match similarity, and the final regression layers.
Experiments conducted on the 3DMatch and KITTI datasets exhibit the compelling and robust performance of our model compared to state-of-the-art approaches.
arXiv Detail & Related papers (2024-10-08T06:48:01Z) - Diff-Reg v1: Diffusion Matching Model for Registration Problem [34.57825794576445]
Existing methods commonly leverage geometric or semantic point features to generate potential correspondences.
Previous methods, which rely on single-pass prediction, may struggle with local minima in complex scenarios.
We introduce a diffusion matching model for robust correspondence estimation.
arXiv Detail & Related papers (2024-03-29T02:10:38Z) - SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D
Object Pose Estimation [66.16525145765604]
We introduce an SE(3) diffusion model-based point cloud registration framework for 6D object pose estimation in real-world scenarios.
Our approach formulates the 3D registration task as a denoising diffusion process, which progressively refines the pose of the source point cloud.
Experiments demonstrate that our diffusion registration framework presents outstanding pose estimation performance on the real-world TUD-L, LINEMOD, and Occluded-LINEMOD datasets.
arXiv Detail & Related papers (2023-10-26T12:47:26Z) - Interpretable 2D Vision Models for 3D Medical Images [47.75089895500738]
This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images.
We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods.
arXiv Detail & Related papers (2023-07-13T08:27:09Z) - CheckerPose: Progressive Dense Keypoint Localization for Object Pose
Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task.
Recent studies have shown the great potential of dense correspondence-based solutions.
We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z) - Searching Dense Point Correspondences via Permutation Matrix Learning [50.764666304335]
This paper presents a novel end-to-end learning-based method to estimate the dense correspondence of 3D point clouds.
Our method achieves state-of-the-art performance for dense correspondence learning.
arXiv Detail & Related papers (2022-10-26T17:56:09Z) - LiP-Flow: Learning Inference-time Priors for Codec Avatars via
Normalizing Flows in Latent Space [90.74976459491303]
We introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.
A normalizing flow bridges the two representation spaces and transforms latent samples from one domain to another, allowing us to define a latent likelihood objective.
We show that our approach leads to an expressive and effective prior, capturing facial dynamics and subtle expressions better.
arXiv Detail & Related papers (2022-03-15T13:22:57Z) - DFC: Deep Feature Consistency for Robust Point Cloud Registration [0.4724825031148411]
We present a novel learning-based alignment network for complex alignment scenes.
We validate our approach on the 3DMatch dataset and the KITTI odometry dataset.
arXiv Detail & Related papers (2021-11-15T08:27:21Z) - U-mesh: Human Correspondence Matching with Mesh Convolutional Networks [15.828285556159026]
We propose an elegant fusion of regression (bottom-up) and generative (top-down) methods to fit a parametric template model to raw scan meshes.
Our first major contribution is an intrinsic convolutional mesh U-net architecture that predicts pointwise correspondence to a template surface.
We evaluate the proposed method on the FAUST correspondence challenge where we achieve 20% (33%) improvement over state of the art methods for inter- (intra-) subject correspondence.
arXiv Detail & Related papers (2021-08-15T08:58:45Z) - DeepGMR: Learning Latent Gaussian Mixture Models for Registration [113.74060941036664]
Point cloud registration is a fundamental problem in 3D computer vision, graphics and robotics.
In this paper, we introduce Deep Gaussian Mixture Registration (DeepGMR), the first learning-based registration method.
Our proposed method shows favorable performance when compared with state-of-the-art geometry-based and learning-based registration methods.
arXiv Detail & Related papers (2020-08-20T17:25:16Z) - Learning 2D-3D Correspondences To Solve The Blind Perspective-n-Point
Problem [98.92148855291363]
This paper proposes a deep CNN model which simultaneously solves for both 6-DoF absolute camera pose 2D--3D correspondences.
Tests on both real and simulated data have shown that our method substantially outperforms existing approaches.
arXiv Detail & Related papers (2020-03-15T04:17:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.