DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation
- URL: http://arxiv.org/abs/2303.15181v3
- Date: Sat, 23 Sep 2023 15:20:07 GMT
- Title: DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation
- Authors: Zhengzhe Liu, Peng Dai, Ruihui Li, Xiaojuan Qi, Chi-Wing Fu
- Abstract summary: We present a new text-guided 3D shape generation approach DreamStone.
It uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data.
Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity.
- Score: 105.97545053660619
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present a new text-guided 3D shape generation approach
DreamStone that uses images as a stepping stone to bridge the gap between text
and shape modalities for generating 3D shapes without requiring paired text and
3D data. The core of our approach is a two-stage feature-space alignment
strategy that leverages a pre-trained single-view reconstruction (SVR) model to
map CLIP features to shapes: to begin with, map the CLIP image feature to the
detail-rich 3D shape space of the SVR model, then map the CLIP text feature to
the 3D shape space through encouraging the CLIP-consistency between rendered
images and the input text. Besides, to extend beyond the generative capability
of the SVR model, we design a text-guided 3D shape stylization module that can
enhance the output shapes with novel structures and textures. Further, we
exploit pre-trained text-to-image diffusion models to enhance the generative
diversity, fidelity, and stylization capability. Our approach is generic,
flexible, and scalable, and it can be easily integrated with various SVR models
to expand the generative space and improve the generative fidelity. Extensive
experimental results demonstrate that our approach outperforms the
state-of-the-art methods in terms of generative quality and consistency with
the input text. Codes and models are released at
https://github.com/liuzhengzhe/DreamStone-ISS.
Related papers
- EXIM: A Hybrid Explicit-Implicit Representation for Text-Guided 3D Shape
Generation [124.27302003578903]
This paper presents a new text-guided technique for generating 3D shapes.
We leverage a hybrid 3D representation, namely EXIM, combining the strengths of explicit and implicit representations.
We demonstrate the applicability of our approach to generate indoor scenes with consistent styles using text-induced 3D shapes.
arXiv Detail & Related papers (2023-11-03T05:01:51Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text
Aligned Latent Representation [47.945556996219295]
We present a novel alignment-before-generation approach to generate 3D shapes based on 2D images or texts.
Our framework comprises two models: a Shape-Image-Text-Aligned Variational Auto-Encoder (SITA-VAE) and a conditional Aligned Shape Latent Diffusion Model (ASLDM)
arXiv Detail & Related papers (2023-06-29T17:17:57Z) - TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision [114.56048848216254]
We present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions.
Based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates.
Our constructed captions provide high-level semantic supervision for generated 3D shapes.
arXiv Detail & Related papers (2023-03-23T13:53:16Z) - Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and
Text-to-Image Diffusion Models [44.34479731617561]
We introduce explicit 3D shape priors into the CLIP-guided 3D optimization process.
We present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model.
Our method, Dream3D, is capable of generating imaginative 3D content with superior visual quality and shape accuracy.
arXiv Detail & Related papers (2022-12-28T18:23:47Z) - SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation [89.47132156950194]
We present a novel framework built to simplify 3D asset generation for amateur users.
Our method supports a variety of input modalities that can be easily provided by a human.
Our model can combine all these tasks into one swiss-army-knife tool.
arXiv Detail & Related papers (2022-12-08T18:59:05Z) - ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation [91.37036638939622]
This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities.
Our key contribution is a two-stage feature-space-alignment approach that maps CLIP features to shapes.
We formulate a text-guided shape stylization module to dress up the output shapes with novel textures.
arXiv Detail & Related papers (2022-09-09T06:54:21Z) - Text to Mesh Without 3D Supervision Using Limit Subdivision [13.358081015190255]
We present a technique for zero-shot generation of a 3D model using only a target text prompt.
We rely on a pre-trained CLIP model that compares the input text prompt with differentiably rendered images of our 3D model.
arXiv Detail & Related papers (2022-03-24T20:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.