Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models
- URL: http://arxiv.org/abs/2303.11989v2
- Date: Sun, 10 Sep 2023 15:18:03 GMT
- Title: Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models
- Authors: Lukas H\"ollein, Ang Cao, Andrew Owens, Justin Johnson, Matthias
Nie{\ss}ner
- Abstract summary: We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.
We leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses.
In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model.
- Score: 21.622420436349245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Text2Room, a method for generating room-scale textured 3D meshes
from a given text prompt as input. To this end, we leverage pre-trained 2D
text-to-image models to synthesize a sequence of images from different poses.
In order to lift these outputs into a consistent 3D scene representation, we
combine monocular depth estimation with a text-conditioned inpainting model.
The core idea of our approach is a tailored viewpoint selection such that the
content of each image can be fused into a seamless, textured 3D mesh. More
specifically, we propose a continuous alignment strategy that iteratively fuses
scene frames with the existing geometry to create a seamless mesh. Unlike
existing works that focus on generating single objects or zoom-out trajectories
from text, our method generates complete 3D scenes with multiple objects and
explicit 3D geometry. We evaluate our approach using qualitative and
quantitative metrics, demonstrating it as the first method to generate
room-scale 3D geometry with compelling textures from only text as input.
Related papers
- Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting [75.7154104065613]
We introduce a novel depth completion model, trained via teacher distillation and self-training to learn the 3D fusion process.
We also introduce a new benchmarking scheme for scene generation methods that is based on ground truth geometry.
arXiv Detail & Related papers (2024-04-30T17:59:40Z) - EucliDreamer: Fast and High-Quality Texturing for 3D Models with Depth-Conditioned Stable Diffusion [5.158983929861116]
We present EucliDreamer, a simple and effective method to generate textures for 3D models given text and prompts.
The texture is parametized as an implicit function on the 3D surface, which is optimized with the Score Distillation Sampling (SDS) process and differentiable rendering.
arXiv Detail & Related papers (2024-04-16T04:44:16Z) - RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion [39.03289977892935]
RealmDreamer is a technique for generation of general forward-facing 3D scenes from text descriptions.
Our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles.
arXiv Detail & Related papers (2024-04-10T17:57:41Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - Consistent Mesh Diffusion [8.318075237885857]
Given a 3D mesh with a UV parameterization, we introduce a novel approach to generating textures from text prompts.
We demonstrate our approach on a dataset containing 30 meshes, taking approximately 5 minutes per mesh.
arXiv Detail & Related papers (2023-12-01T23:25:14Z) - 3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with
2D Diffusion Models [102.75875255071246]
3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community.
We propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models.
arXiv Detail & Related papers (2023-11-09T15:51:27Z) - TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision [114.56048848216254]
We present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions.
Based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates.
Our constructed captions provide high-level semantic supervision for generated 3D shapes.
arXiv Detail & Related papers (2023-03-23T13:53:16Z) - TEXTure: Text-Guided Texturing of 3D Shapes [71.13116133846084]
We present TEXTure, a novel method for text-guided editing, editing, and transfer of textures for 3D shapes.
We define a trimap partitioning process that generates seamless 3D textures without requiring explicit surface textures.
arXiv Detail & Related papers (2023-02-03T13:18:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.