Probabilistic Inverse Cameras: Image to 3D via Multiview Geometry
- URL: http://arxiv.org/abs/2412.10273v1
- Date: Fri, 13 Dec 2024 16:46:46 GMT
- Title: Probabilistic Inverse Cameras: Image to 3D via Multiview Geometry
- Authors: Rishabh Kabra, Drew A. Hudson, Sjoerd van Steenkiste, Joao Carreira, Niloy J. Mitra,
- Abstract summary: We introduce a hierarchical probabilistic approach to go from a 2D image to multiview 3D.
A diffusion "prior" models the unseen 3D geometry, which then conditions a diffusion "decoder" to generate novel views of the subject.
- Score: 37.52243979087041
- License:
- Abstract: We introduce a hierarchical probabilistic approach to go from a 2D image to multiview 3D: a diffusion "prior" models the unseen 3D geometry, which then conditions a diffusion "decoder" to generate novel views of the subject. We use a pointmap-based geometric representation in a multiview image format to coordinate the generation of multiple target views simultaneously. We facilitate correspondence between views by assuming fixed target camera poses relative to the source camera, and constructing a predictable distribution of geometric features per target. Our modular, geometry-driven approach to novel-view synthesis (called "unPIC") beats SoTA baselines such as CAT3D and One-2-3-45 on held-out objects from ObjaverseXL, as well as real-world objects ranging from Google Scanned Objects, Amazon Berkeley Objects, to the Digital Twin Catalog.
Related papers
- FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views [93.6881532277553]
We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images.
Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
arXiv Detail & Related papers (2025-02-17T18:54:05Z) - Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics.
We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z) - Viewpoint Equivariance for Multi-View 3D Object Detection [35.4090127133834]
State-of-the-art methods focus on reasoning and decoding object bounding boxes from multi-view camera input.
We introduce VEDet, a novel 3D object detection framework that exploits 3D multi-view geometry.
arXiv Detail & Related papers (2023-03-25T19:56:41Z) - Neural Correspondence Field for Object Pose Estimation [67.96767010122633]
We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image.
Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum.
arXiv Detail & Related papers (2022-07-30T01:48:23Z) - 3D Object Detection and Pose Estimation of Unseen Objects in Color
Images with Local Surface Embeddings [35.769234123059086]
We present an approach for detecting and estimating the 3D poses of objects in images that requires only an untextured CAD model.
Our approach combines Deep Learning and 3D geometry: It relies on an embedding of local 3D geometry to match the CAD models to the input images.
We show that we can use Mask-RCNN in a class-agnostic way to detect the new objects without retraining and thus drastically limit the number of possible correspondences.
arXiv Detail & Related papers (2020-10-08T15:57:06Z) - Improved Modeling of 3D Shapes with Multi-view Depth Maps [48.8309897766904]
We present a general-purpose framework for modeling 3D shapes using CNNs.
Using just a single depth image of the object, we can output a dense multi-view depth map representation of 3D objects.
arXiv Detail & Related papers (2020-09-07T17:58:27Z) - Canonical 3D Deformer Maps: Unifying parametric and non-parametric
methods for dense weakly-supervised category reconstruction [79.98689027127855]
We propose a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings.
It achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
arXiv Detail & Related papers (2020-08-28T15:44:05Z) - Shape and Viewpoint without Keypoints [63.26977130704171]
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image.
We trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.
We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects.
arXiv Detail & Related papers (2020-07-21T17:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.