TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision
- URL: http://arxiv.org/abs/2303.13273v1
- Date: Thu, 23 Mar 2023 13:53:16 GMT
- Title: TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision
- Authors: Jiacheng Wei, Hao Wang, Jiashi Feng, Guosheng Lin, Kim-Hui Yap
- Abstract summary: We present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions.
Based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates.
Our constructed captions provide high-level semantic supervision for generated 3D shapes.
- Score: 114.56048848216254
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we investigate an open research task of generating
controllable 3D textured shapes from the given textual descriptions. Previous
works either require ground truth caption labeling or extensive optimization
time. To resolve these issues, we present a novel framework, TAPS3D, to train a
text-guided 3D shape generator with pseudo captions. Specifically, based on
rendered 2D images, we retrieve relevant words from the CLIP vocabulary and
construct pseudo captions using templates. Our constructed captions provide
high-level semantic supervision for generated 3D shapes. Further, in order to
produce fine-grained textures and increase geometry diversity, we propose to
adopt low-level image regularization to enable fake-rendered images to align
with the real ones. During the inference phase, our proposed model can generate
3D textured shapes from the given text without any additional optimization. We
conduct extensive experiments to analyze each of our proposed components and
show the efficacy of our framework in generating high-fidelity 3D textured and
text-relevant shapes.
Related papers
- PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion [18.82883336156591]
We present PI3D, a framework that fully leverages the pre-trained text-to-image diffusion models' ability to generate high-quality 3D shapes from text prompts in minutes.
PI3D generates a single 3D shape from text in only 3 minutes and the quality is validated to outperform existing 3D generative models by a large margin.
arXiv Detail & Related papers (2023-12-14T16:04:34Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation [105.97545053660619]
We present a new text-guided 3D shape generation approach DreamStone.
It uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data.
Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity.
arXiv Detail & Related papers (2023-03-24T03:56:23Z) - Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models [21.622420436349245]
We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.
We leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses.
In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model.
arXiv Detail & Related papers (2023-03-21T16:21:02Z) - Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and
Text-to-Image Diffusion Models [44.34479731617561]
We introduce explicit 3D shape priors into the CLIP-guided 3D optimization process.
We present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model.
Our method, Dream3D, is capable of generating imaginative 3D content with superior visual quality and shape accuracy.
arXiv Detail & Related papers (2022-12-28T18:23:47Z) - 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation [107.46972849241168]
3D-TOGO model generates 3D objects in the form of the neural radiance field with good texture.
Experiments on the largest 3D object dataset (i.e., ABO) are conducted to verify that 3D-TOGO can better generate high-quality 3D objects.
arXiv Detail & Related papers (2022-12-02T11:31:49Z) - ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation [91.37036638939622]
This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities.
Our key contribution is a two-stage feature-space-alignment approach that maps CLIP features to shapes.
We formulate a text-guided shape stylization module to dress up the output shapes with novel textures.
arXiv Detail & Related papers (2022-09-09T06:54:21Z) - Towards Implicit Text-Guided 3D Shape Generation [81.22491096132507]
This work explores the challenging task of generating 3D shapes from text.
We propose a new approach for text-guided 3D shape generation, capable of producing high-fidelity shapes with colors that match the given text description.
arXiv Detail & Related papers (2022-03-28T10:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.