ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
- URL: http://arxiv.org/abs/2311.17618v3
- Date: Fri, 1 Dec 2023 12:46:13 GMT
- Title: ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
- Authors: Fukun Yin, Xin Chen, Chi Zhang, Biao Jiang, Zibo Zhao, Jiayuan Fan,
Gang Yu, Taihao Li, Tao Chen
- Abstract summary: We present ShapeGPT, a shape-included multi-modal framework to leverage strong pre-trained language models to address multiple shape-relevant tasks.
Specifically, ShapeGPT employs a word-sentence-paragraph framework to discretize continuous shapes into shape words, further assembles these words for shape sentences, and integrates shape with instructional text for multi-modal paragraphs.
Experiments demonstrate that ShapeGPT achieves comparable performance across shape-relevant tasks, including text-to-shape, shape-to-text, shape completion, and shape editing.
- Score: 27.122194733305594
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The advent of large language models, enabling flexibility through
instruction-driven approaches, has revolutionized many traditional generative
tasks, but large models for 3D data, particularly in comprehensively handling
3D shapes with other modalities, are still under-explored. By achieving
instruction-based shape generations, versatile multimodal generative shape
models can significantly benefit various fields like 3D virtual construction
and network-aided design. In this work, we present ShapeGPT, a shape-included
multi-modal framework to leverage strong pre-trained language models to address
multiple shape-relevant tasks. Specifically, ShapeGPT employs a
word-sentence-paragraph framework to discretize continuous shapes into shape
words, further assembles these words for shape sentences, as well as integrates
shape with instructional text for multi-modal paragraphs. To learn this
shape-language model, we use a three-stage training scheme, including shape
representation, multimodal alignment, and instruction-based generation, to
align shape-language codebooks and learn the intricate correlations among these
modalities. Extensive experiments demonstrate that ShapeGPT achieves comparable
performance across shape-relevant tasks, including text-to-shape,
shape-to-text, shape completion, and shape editing.
Related papers
- EXIM: A Hybrid Explicit-Implicit Representation for Text-Guided 3D Shape
Generation [124.27302003578903]
This paper presents a new text-guided technique for generating 3D shapes.
We leverage a hybrid 3D representation, namely EXIM, combining the strengths of explicit and implicit representations.
We demonstrate the applicability of our approach to generate indoor scenes with consistent styles using text-induced 3D shapes.
arXiv Detail & Related papers (2023-11-03T05:01:51Z) - DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation [105.97545053660619]
We present a new text-guided 3D shape generation approach DreamStone.
It uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data.
Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity.
arXiv Detail & Related papers (2023-03-24T03:56:23Z) - 3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion
Process [32.3773514247982]
We develop a generalized 3D shape generation prior model tailored for multiple 3D tasks.
Designs jointly equip our proposed 3D shape prior model with high-fidelity, diverse features as well as the capability of cross-modality alignment.
arXiv Detail & Related papers (2023-03-18T12:50:29Z) - SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation [89.47132156950194]
We present a novel framework built to simplify 3D asset generation for amateur users.
Our method supports a variety of input modalities that can be easily provided by a human.
Our model can combine all these tasks into one swiss-army-knife tool.
arXiv Detail & Related papers (2022-12-08T18:59:05Z) - ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation [91.37036638939622]
This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities.
Our key contribution is a two-stage feature-space-alignment approach that maps CLIP features to shapes.
We formulate a text-guided shape stylization module to dress up the output shapes with novel textures.
arXiv Detail & Related papers (2022-09-09T06:54:21Z) - Towards Implicit Text-Guided 3D Shape Generation [81.22491096132507]
This work explores the challenging task of generating 3D shapes from text.
We propose a new approach for text-guided 3D shape generation, capable of producing high-fidelity shapes with colors that match the given text description.
arXiv Detail & Related papers (2022-03-28T10:20:03Z) - ShapeAssembly: Learning to Generate Programs for 3D Shape Structure
Synthesis [38.27280837835169]
We propose ShapeAssembly, a domain-specific "assembly-language" for 3D shape structures.
We show how to extract ShapeAssembly programs from existing shape structures in the PartNet dataset.
We evaluate our approach by comparing shapes output by our generated programs to those from other recent shape structure models.
arXiv Detail & Related papers (2020-09-17T02:26:45Z) - Learning Generative Models of Shape Handles [43.41382075567803]
We present a generative model to synthesize 3D shapes as sets of handles.
Our model can generate handle sets with varying cardinality and different types of handles.
We show that the resulting shape representations are intuitive and achieve superior quality than previous state-of-the-art.
arXiv Detail & Related papers (2020-04-06T22:35:55Z) - Self-Supervised 2D Image to 3D Shape Translation with Disentangled
Representations [92.89846887298852]
We present a framework to translate between 2D image views and 3D object shapes.
We propose SIST, a Self-supervised Image to Shape Translation framework.
arXiv Detail & Related papers (2020-03-22T22:44:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.