Zero-Shot Text-Guided Object Generation with Dream Fields
- URL: http://arxiv.org/abs/2112.01455v1
- Date: Thu, 2 Dec 2021 17:53:55 GMT
- Title: Zero-Shot Text-Guided Object Generation with Dream Fields
- Authors: Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben
Poole
- Abstract summary: We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D objects.
Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision.
In experiments, Dream Fields produce realistic, multi-view consistent object geometry and color from a variety of natural language captions.
- Score: 111.06026544180398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We combine neural rendering with multi-modal image and text representations
to synthesize diverse 3D objects solely from natural language descriptions. Our
method, Dream Fields, can generate the geometry and color of a wide range of
objects without 3D supervision. Due to the scarcity of diverse, captioned 3D
data, prior methods only generate objects from a handful of categories, such as
ShapeNet. Instead, we guide generation with image-text models pre-trained on
large datasets of captioned images from the web. Our method optimizes a Neural
Radiance Field from many camera views so that rendered images score highly with
a target caption according to a pre-trained CLIP model. To improve fidelity and
visual quality, we introduce simple geometric priors, including
sparsity-inducing transmittance regularization, scene bounds, and new MLP
architectures. In experiments, Dream Fields produce realistic, multi-view
consistent object geometry and color from a variety of natural language
captions.
Related papers
- ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes [67.5351491691866]
We present a novel framework, dubbed TeMO, to parse multi-object 3D scenes and edit their styles.
Our method can synthesize high-quality stylized content and outperform the existing methods over a wide range of multi-object 3D meshes.
arXiv Detail & Related papers (2023-12-07T12:10:05Z) - Differentiable Blocks World: Qualitative 3D Decomposition by Rendering
Primitives [70.32817882783608]
We present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives.
Unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images.
We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points.
arXiv Detail & Related papers (2023-07-11T17:58:31Z) - Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields [29.907615852310204]
We present Text2NeRF, which is able to generate a wide range of 3D scenes purely from a text prompt.
Our method requires no additional training data but only a natural language description of the scene as the input.
arXiv Detail & Related papers (2023-05-19T10:58:04Z) - Single-Shot Implicit Morphable Faces with Consistent Texture
Parameterization [91.52882218901627]
We propose a novel method for constructing implicit 3D morphable face models that are both generalizable and intuitive for editing.
Our method improves upon photo-realism, geometry, and expression accuracy compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-04T17:58:40Z) - DreamBooth3D: Subject-Driven Text-to-3D Generation [43.14506066034495]
We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject.
We find that naively combining these methods fails to yield satisfactory subject-specific 3D assets due to personalized text-to-image models overfitting to the input viewpoints of the subject.
Our method can produce high-quality, subject-specific 3D assets with text-driven modifications such as novel poses, colors and attributes that are not seen in any of the input images of the subject.
arXiv Detail & Related papers (2023-03-23T17:59:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.