ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion
Models
- URL: http://arxiv.org/abs/2306.17140v2
- Date: Thu, 30 Nov 2023 18:33:12 GMT
- Title: ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion
Models
- Authors: Weihao Cheng, Yan-Pei Cao, Ying Shan
- Abstract summary: We present ID-Pose which inverses the denoising diffusion process to estimate the relative pose given two input images.
We extend ID-Pose to handle more than two images and estimate each pose with multiple image pairs from triangular relations.
Results demonstrate that ID-Pose significantly outperforms state-of-the-art methods.
- Score: 43.86792681109704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given sparse views of a 3D object, estimating their camera poses is a
long-standing and intractable problem. Toward this goal, we consider harnessing
the pre-trained diffusion model of novel views conditioned on viewpoints
(Zero-1-to-3). We present ID-Pose which inverses the denoising diffusion
process to estimate the relative pose given two input images. ID-Pose adds a
noise to one image, and predicts the noise conditioned on the other image and a
hypothesis of the relative pose. The prediction error is used as the
minimization objective to find the optimal pose with the gradient descent
method. We extend ID-Pose to handle more than two images and estimate each pose
with multiple image pairs from triangular relations. ID-Pose requires no
training and generalizes to open-world images. We conduct extensive experiments
using casually captured photos and rendered images with random viewpoints. The
results demonstrate that ID-Pose significantly outperforms state-of-the-art
methods.
Related papers
- SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views [36.02533658048349]
We propose a novel method, SpaRP, to reconstruct a 3D textured mesh and estimate the relative camera poses for sparse-view images.
SpaRP distills knowledge from 2D diffusion models and finetunes them to implicitly deduce the 3D spatial relationships between the sparse views.
It requires only about 20 seconds to produce a textured mesh and camera poses for the input views.
arXiv Detail & Related papers (2024-08-19T17:53:10Z) - ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation [17.097170273209333]
Recovering camera poses from a set of images is a foundational task in 3D computer vision.
Recent data-driven approaches aim to directly output camera poses, either through regressing the 6DoF camera poses or formulating rotation as a probability distribution.
We propose ADen to unify the two frameworks by employing a generator and a discriminator.
arXiv Detail & Related papers (2024-08-16T22:45:46Z) - SRPose: Two-view Relative Pose Estimation with Sparse Keypoints [51.49105161103385]
SRPose is a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios.
It achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed.
It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
arXiv Detail & Related papers (2024-07-11T05:46:35Z) - Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views.
We propose a distributed representation of camera pose that treats a camera as a bundle of rays.
Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion models [5.908471365011943]
We propose emphDiffPose, a conditional diffusion model that predicts multiple hypotheses for a given input image.
We show that DiffPose slightly improves upon the state of the art for multi-hypothesis pose estimation for simple poses and outperforms it by a large margin for highly ambiguous poses.
arXiv Detail & Related papers (2022-11-29T18:55:13Z) - Stochastic Modeling for Learnable Human Pose Triangulation [0.7646713951724009]
We propose a modeling framework for 3D human pose triangulation and evaluate its performance across different datasets and spatial camera arrangements.
The proposed pose triangulation model successfully generalizes to different camera arrangements and between two public datasets.
arXiv Detail & Related papers (2021-10-01T09:26:25Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.