Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D
Data
- URL: http://arxiv.org/abs/2306.07881v2
- Date: Fri, 1 Sep 2023 11:09:36 GMT
- Title: Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D
Data
- Authors: Stanislaw Szymanowicz and Christian Rupprecht and Andrea Vedaldi
- Abstract summary: Viewset Diffusion is a diffusion-based generator that outputs 3D objects while only using multi-view 2D data for supervision.
We train a diffusion model to generate viewsets, but design the neural network generator to reconstruct internally corresponding 3D models.
The model performs reconstruction efficiently, in a feed-forward manner, and is trained using only rendering losses using as few as three views per viewset.
- Score: 76.38261311948649
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present Viewset Diffusion, a diffusion-based generator that outputs 3D
objects while only using multi-view 2D data for supervision. We note that there
exists a one-to-one mapping between viewsets, i.e., collections of several 2D
views of an object, and 3D models. Hence, we train a diffusion model to
generate viewsets, but design the neural network generator to reconstruct
internally corresponding 3D models, thus generating those too. We fit a
diffusion model to a large number of viewsets for a given category of objects.
The resulting generator can be conditioned on zero, one or more input views.
Conditioned on a single view, it performs 3D reconstruction accounting for the
ambiguity of the task and allowing to sample multiple solutions compatible with
the input. The model performs reconstruction efficiently, in a feed-forward
manner, and is trained using only rendering losses using as few as three views
per viewset. Project page: szymanowiczs.github.io/viewset-diffusion.
Related papers
- Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models [3.9373541926236766]
We present a latent diffusion model over 3D scenes, that can be trained using only 2D image data.
We show that our approach enables generating 3D scenes in as little as 0.2 seconds, either from scratch, or from sparse input views.
arXiv Detail & Related papers (2024-06-18T23:14:29Z) - MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View [0.0]
This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model.
Our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
arXiv Detail & Related papers (2024-05-06T22:55:53Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - DreamComposer: Controllable 3D Object Generation via Multi-View Conditions [45.4321454586475]
Recent works are capable of generating high-quality novel views from a single in-the-wild image.
Due to the lack of information from multiple views, these works encounter difficulties in generating controllable novel views.
We present DreamComposer, a flexible and scalable framework that can enhance existing view-aware diffusion models by injecting multi-view conditions.
arXiv Detail & Related papers (2023-12-06T16:55:53Z) - ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion [61.37481051263816]
Given a single image of a 3D object, this paper proposes a method (named ConsistNet) that is able to generate multiple images of the same object.
Our method effectively learns 3D consistency over a frozen Zero123 backbone and can generate 16 surrounding views of the object within 40 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-10-16T12:29:29Z) - Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion [67.71624118802411]
We present Farm3D, a method for learning category-specific 3D reconstructors for articulated objects.
We propose a framework that uses an image generator, such as Stable Diffusion, to generate synthetic training data.
Our network can be used for analysis, including monocular reconstruction, or for synthesis, generating articulated assets for real-time applications such as video games.
arXiv Detail & Related papers (2023-04-20T17:59:34Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and
Generation [68.06991943974195]
We present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision.
We evaluate RenderDiffusion on FFHQ, AFHQ, ShapeNet and CLEVR datasets, showing competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images.
arXiv Detail & Related papers (2022-11-17T20:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.