ART-DECO: Arbitrary Text Guidance for 3D Detailizer Construction
- URL: http://arxiv.org/abs/2505.20431v2
- Date: Fri, 30 May 2025 23:05:04 GMT
- Title: ART-DECO: Arbitrary Text Guidance for 3D Detailizer Construction
- Authors: Qimin Chen, Yuezhi Yang, Yifang Wang, Vladimir G. Kim, Siddhartha Chaudhuri, Hao Zhang, Zhiqin Chen,
- Abstract summary: We introduce a 3D detailizer, a neural model which can instantaneously transform a coarse 3D shape proxy into a high-quality asset.<n>Our model is trained using the text prompt, which defines the shape class and characterizes the appearance and fine-grained style of the generated details.<n>Our detailizer is not optimized for a single shape; it is the result of distilling a generative model, so that it can be reused, without retraining, to generate any number of shapes.
- Score: 31.513768848094227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a 3D detailizer, a neural model which can instantaneously (in <1s) transform a coarse 3D shape proxy into a high-quality asset with detailed geometry and texture as guided by an input text prompt. Our model is trained using the text prompt, which defines the shape class and characterizes the appearance and fine-grained style of the generated details. The coarse 3D proxy, which can be easily varied and adjusted (e.g., via user editing), provides structure control over the final shape. Importantly, our detailizer is not optimized for a single shape; it is the result of distilling a generative model, so that it can be reused, without retraining, to generate any number of shapes, with varied structures, whose local details all share a consistent style and appearance. Our detailizer training utilizes a pretrained multi-view image diffusion model, with text conditioning, to distill the foundational knowledge therein into our detailizer via Score Distillation Sampling (SDS). To improve SDS and enable our detailizer architecture to learn generalizable features over complex structures, we train our model in two training stages to generate shapes with increasing structural complexity. Through extensive experiments, we show that our method generates shapes of superior quality and details compared to existing text-to-3D models under varied structure control. Our detailizer can refine a coarse shape in less than a second, making it possible to interactively author and adjust 3D shapes. Furthermore, the user-imposed structure control can lead to creative, and hence out-of-distribution, 3D asset generations that are beyond the current capabilities of leading text-to-3D generative models. We demonstrate an interactive 3D modeling workflow our method enables, and its strong generalizability over styles, structures, and object categories.
Related papers
- Structured 3D Latents for Scalable and Versatile 3D Generation [28.672494137267837]
We introduce a novel 3D generation method for versatile and high-quality 3D asset creation.<n>The cornerstone is a unified Structured LATent representation which allows decoding to different output formats.<n>This is achieved by integrating a sparsely-populated 3D grid with dense multiview visual features extracted from a powerful vision foundation model.
arXiv Detail & Related papers (2024-12-02T13:58:38Z) - DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow [44.72037991063735]
DetailGen3D is a generative approach specifically designed to enhance generated 3D shapes.<n>Our key insight is to model the coarse-to-fine transformation directly through data-dependent flows in latent space.<n>We introduce a token matching strategy that ensures accurate spatial correspondence during refinement.
arXiv Detail & Related papers (2024-11-25T17:08:17Z) - DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement [38.719572669042925]
We present a 3D modeling method which enables end-users to refine or detailize 3D shapes using machine learning.
We show that our ability to localize details enables novel interactive creative and applications.
arXiv Detail & Related papers (2024-09-10T00:51:49Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - ShaDDR: Interactive Example-Based Geometry and Texture Generation via 3D
Shape Detailization and Differentiable Rendering [24.622120688131616]
ShaDDR is an example-based deep generative neural network which produces a high-resolution textured 3D shape.
Our method learns to detailize the geometry via multi-resolution voxel upsampling and generate textures on voxel surfaces.
The generated shape preserves the overall structure of the input coarse voxel model.
arXiv Detail & Related papers (2023-06-08T02:35:30Z) - DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation [105.97545053660619]
We present a new text-guided 3D shape generation approach DreamStone.
It uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data.
Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity.
arXiv Detail & Related papers (2023-03-24T03:56:23Z) - TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision [114.56048848216254]
We present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions.
Based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates.
Our constructed captions provide high-level semantic supervision for generated 3D shapes.
arXiv Detail & Related papers (2023-03-23T13:53:16Z) - SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation [89.47132156950194]
We present a novel framework built to simplify 3D asset generation for amateur users.
Our method supports a variety of input modalities that can be easily provided by a human.
Our model can combine all these tasks into one swiss-army-knife tool.
arXiv Detail & Related papers (2022-12-08T18:59:05Z) - ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation [91.37036638939622]
This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities.
Our key contribution is a two-stage feature-space-alignment approach that maps CLIP features to shapes.
We formulate a text-guided shape stylization module to dress up the output shapes with novel textures.
arXiv Detail & Related papers (2022-09-09T06:54:21Z) - DECOR-GAN: 3D Shape Detailization by Conditional Refinement [50.8801457082181]
We introduce a deep generative network for 3D shape detailization, akin to stylization with the style being geometric details.
We demonstrate that our method can refine a coarse shape into a variety of detailed shapes with different styles.
arXiv Detail & Related papers (2020-12-16T18:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.