Related papers: PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment

PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment

URL: http://arxiv.org/abs/2306.15667v4
Date: Wed, 24 Jan 2024 21:00:12 GMT
Title: PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment
Authors: Jianyuan Wang, Christian Rupprecht, David Novotny
Abstract summary: We propose to formulate the Structure from Motion (SfM) problem inside a probabilistic diffusion framework. We show that our method PoseDiffusion significantly improves over the classic SfM pipelines. It is observed that our method can generalize across datasets without further training.
Score: 21.98302129015761
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Camera pose estimation is a long-standing computer vision problem that to date often relies on classical methods, such as handcrafted keypoint matching, RANSAC and bundle adjustment. In this paper, we propose to formulate the Structure from Motion (SfM) problem inside a probabilistic diffusion framework, modelling the conditional distribution of camera poses given input images. This novel view of an old problem has several advantages. (i) The nature of the diffusion framework mirrors the iterative procedure of bundle adjustment. (ii) The formulation allows a seamless integration of geometric constraints from epipolar geometry. (iii) It excels in typically difficult scenarios such as sparse views with wide baselines. (iv) The method can predict intrinsics and extrinsics for an arbitrary amount of images. We demonstrate that our method PoseDiffusion significantly improves over the classic SfM pipelines and the learned approaches on two real-world datasets. Finally, it is observed that our method can generalize across datasets without further training. Project page: https://posediffusion.github.io/

Related papers

DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion [53.70278210626701]
We propose a data-driven multi-view reasoning approach that directly infers 3D scene geometry and camera poses from multi-view images.<n>Our framework, DiffusionSfM, parameterizes scene geometry and cameras as pixel-wise ray origins and endpoints in a global frame.<n>We empirically validate DiffusionSfM on both synthetic and real datasets, demonstrating that it outperforms classical and learning-based approaches.
arXiv Detail & Related papers (2025-05-08T17:59:47Z)
ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation [17.097170273209333]
Recovering camera poses from a set of images is a foundational task in 3D computer vision. Recent data-driven approaches aim to directly output camera poses, either through regressing the 6DoF camera poses or formulating rotation as a probability distribution. We propose ADen to unify the two frameworks by employing a generator and a discriminator.
arXiv Detail & Related papers (2024-08-16T22:45:46Z)
RecDiffusion: Rectangling for Image Stitching with Diffusion Models [53.824503710254206]
We introduce a novel diffusion-based learning framework, textbfRecDiffusion, for image stitching rectangling. This framework combines Motion Diffusion Models (MDM) to generate motion fields, effectively transitioning from the stitched image's irregular borders to a geometrically corrected intermediary.
arXiv Detail & Related papers (2024-03-28T06:22:45Z)
Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views. We propose a distributed representation of camera pose that treats a camera as a bundle of rays. Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z)
Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-based Gaze Estimation [16.43119580796718]
This work proposes a generalizable multi-view gaze estimation task and a cross-view feature fusion method to address this issue. In addition to paired images, our method takes the relative rotation matrix between two cameras as additional input. The proposed network learns to extract rotatable feature representation by using relative rotation as a constraint.
arXiv Detail & Related papers (2023-05-22T04:29:34Z)
A Variational Perspective on Solving Inverse Problems with Diffusion Models [101.831766524264]
Inverse tasks can be formulated as inferring a posterior distribution over data. This is however challenging in diffusion models since the nonlinear and iterative nature of the diffusion process renders the posterior intractable. We propose a variational approach that by design seeks to approximate the true posterior distribution.
arXiv Detail & Related papers (2023-05-07T23:00:47Z)
DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion [144.9653045465908]
We propose a novel fusion algorithm based on the denoising diffusion probabilistic model (DDPM) Our approach yields promising fusion results in infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2023-03-13T04:06:42Z)
GECCO: Geometrically-Conditioned Point Diffusion Models [60.28388617034254]
Diffusion models generating images conditionally on text have recently made a splash far beyond the computer vision community. Here, we tackle the related problem of generating point clouds, both unconditionally, and conditionally with images. For the latter, we introduce a novel geometrically-motivated conditioning scheme based on projecting sparse image features into the point cloud.
arXiv Detail & Related papers (2023-03-10T13:45:44Z)
DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view Structure from Motion [9.294501649791016]
Two-view structure from motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM (vSLAM) We formulate the two-view SfM problem as a maximum likelihood estimation (MLE) and solve it with the proposed framework, denoted as DeepMLE. Our method significantly outperforms the state-of-the-art end-to-end two-view SfM approaches in accuracy and generalization capability.
arXiv Detail & Related papers (2022-10-11T15:07:25Z)
RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object. We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.