ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation
- URL: http://arxiv.org/abs/2209.04145v1
- Date: Fri, 9 Sep 2022 06:54:21 GMT
- Title: ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation
- Authors: Zhengzhe Liu, Peng Dai, Ruihui Li, Xiaojuan Qi, Chi-Wing Fu
- Abstract summary: This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities.
Our key contribution is a two-stage feature-space-alignment approach that maps CLIP features to shapes.
We formulate a text-guided shape stylization module to dress up the output shapes with novel textures.
- Score: 91.37036638939622
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-guided 3D shape generation remains challenging due to the absence of
large paired text-shape data, the substantial semantic gap between these two
modalities, and the structural complexity of 3D shapes. This paper presents a
new framework called Image as Stepping Stone (ISS) for the task by introducing
2D image as a stepping stone to connect the two modalities and to eliminate the
need for paired text-shape data. Our key contribution is a two-stage
feature-space-alignment approach that maps CLIP features to shapes by
harnessing a pre-trained single-view reconstruction (SVR) model with multi-view
supervisions: first map the CLIP image feature to the detail-rich shape space
in the SVR model, then map the CLIP text feature to the shape space and
optimize the mapping by encouraging CLIP consistency between the input text and
the rendered images. Further, we formulate a text-guided shape stylization
module to dress up the output shapes with novel textures. Beyond existing works
on 3D shape generation from text, our new approach is general for creating
shapes in a broad range of categories, without requiring paired text-shape
data. Experimental results manifest that our approach outperforms the
state-of-the-arts and our baselines in terms of fidelity and consistency with
text. Further, our approach can stylize the generated shapes with both
realistic and fantasy structures and textures.
Related papers
- EXIM: A Hybrid Explicit-Implicit Representation for Text-Guided 3D Shape
Generation [124.27302003578903]
This paper presents a new text-guided technique for generating 3D shapes.
We leverage a hybrid 3D representation, namely EXIM, combining the strengths of explicit and implicit representations.
We demonstrate the applicability of our approach to generate indoor scenes with consistent styles using text-induced 3D shapes.
arXiv Detail & Related papers (2023-11-03T05:01:51Z) - Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text
Aligned Latent Representation [47.945556996219295]
We present a novel alignment-before-generation approach to generate 3D shapes based on 2D images or texts.
Our framework comprises two models: a Shape-Image-Text-Aligned Variational Auto-Encoder (SITA-VAE) and a conditional Aligned Shape Latent Diffusion Model (ASLDM)
arXiv Detail & Related papers (2023-06-29T17:17:57Z) - ShapeClipper: Scalable 3D Shape Learning from Single-View Images via
Geometric and CLIP-based Consistency [39.7058456335011]
We present ShapeClipper, a novel method that reconstructs 3D object shapes from real-world single-view RGB images.
ShapeClipper learns shape reconstruction from a set of single-view segmented images.
We evaluate our method over three challenging real-world datasets.
arXiv Detail & Related papers (2023-04-13T03:53:12Z) - DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation [105.97545053660619]
We present a new text-guided 3D shape generation approach DreamStone.
It uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data.
Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity.
arXiv Detail & Related papers (2023-03-24T03:56:23Z) - TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision [114.56048848216254]
We present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions.
Based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates.
Our constructed captions provide high-level semantic supervision for generated 3D shapes.
arXiv Detail & Related papers (2023-03-23T13:53:16Z) - Towards Implicit Text-Guided 3D Shape Generation [81.22491096132507]
This work explores the challenging task of generating 3D shapes from text.
We propose a new approach for text-guided 3D shape generation, capable of producing high-fidelity shapes with colors that match the given text description.
arXiv Detail & Related papers (2022-03-28T10:20:03Z) - Self-Supervised 2D Image to 3D Shape Translation with Disentangled
Representations [92.89846887298852]
We present a framework to translate between 2D image views and 3D object shapes.
We propose SIST, a Self-supervised Image to Shape Translation framework.
arXiv Detail & Related papers (2020-03-22T22:44:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.