HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation
- URL: http://arxiv.org/abs/2403.00372v3
- Date: Tue, 30 Apr 2024 05:32:01 GMT
- Title: HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation
- Authors: Zhiying Leng, Tolga Birdal, Xiaohui Liang, Federico Tombari,
- Abstract summary: We propose HyperSDFusion, a dual-branch diffusion model that generates 3D shapes from a given text.
We learn the hierarchical representations of text and 3D shapes in hyperbolic space.
Our method is the first to explore the hyperbolic hierarchical representation for text-to-shape generation.
- Score: 55.95329424826433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D shape generation from text is a fundamental task in 3D representation learning. The text-shape pairs exhibit a hierarchical structure, where a general text like ``chair" covers all 3D shapes of the chair, while more detailed prompts refer to more specific shapes. Furthermore, both text and 3D shapes are inherently hierarchical structures. However, existing Text2Shape methods, such as SDFusion, do not exploit that. In this work, we propose HyperSDFusion, a dual-branch diffusion model that generates 3D shapes from a given text. Since hyperbolic space is suitable for handling hierarchical data, we propose to learn the hierarchical representations of text and 3D shapes in hyperbolic space. First, we introduce a hyperbolic text-image encoder to learn the sequential and multi-modal hierarchical features of text in hyperbolic space. In addition, we design a hyperbolic text-graph convolution module to learn the hierarchical features of text in hyperbolic space. In order to fully utilize these text features, we introduce a dual-branch structure to embed text features in 3D feature space. At last, to endow the generated 3D shapes with a hierarchical structure, we devise a hyperbolic hierarchical loss. Our method is the first to explore the hyperbolic hierarchical representation for text-to-shape generation. Experimental results on the existing text-to-shape paired dataset, Text2Shape, achieved state-of-the-art results. We release our implementation under HyperSDFusion.github.io.
Related papers
- Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation [2.3213238782019316]
GIMDiffusion is a novel Text-to-3D model that utilizes geometry images to efficiently represent 3D shapes using 2D images.
We exploit the rich 2D priors of existing Text-to-Image models such as Stable Diffusion.
In short, GIMDiffusion enables the generation of 3D assets at speeds comparable to current Text-to-Image models.
arXiv Detail & Related papers (2024-09-05T17:21:54Z) - DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation [105.97545053660619]
We present a new text-guided 3D shape generation approach DreamStone.
It uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data.
Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity.
arXiv Detail & Related papers (2023-03-24T03:56:23Z) - TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision [114.56048848216254]
We present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions.
Based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates.
Our constructed captions provide high-level semantic supervision for generated 3D shapes.
arXiv Detail & Related papers (2023-03-23T13:53:16Z) - Diffusion-SDF: Text-to-Shape via Voxelized Diffusion [90.85011923436593]
We propose a new generative 3D modeling framework called Diffusion-SDF for the challenging task of text-to-shape synthesis.
We show that Diffusion-SDF generates both higher quality and more diversified 3D shapes that conform well to given text descriptions.
arXiv Detail & Related papers (2022-12-06T19:46:47Z) - ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation [91.37036638939622]
This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities.
Our key contribution is a two-stage feature-space-alignment approach that maps CLIP features to shapes.
We formulate a text-guided shape stylization module to dress up the output shapes with novel textures.
arXiv Detail & Related papers (2022-09-09T06:54:21Z) - ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model [16.431391515731367]
Existing methods to generate text-conditioned 3D shapes consume an entire text prompt to generate a 3D shape in a single step.
We introduce a method to generate a 3D shape distribution conditioned on an initial phrase, that gradually evolves as more phrases are added.
Results show that our method can generate shapes consistent with text descriptions, and shapes evolve gradually as more phrases are added.
arXiv Detail & Related papers (2022-07-19T17:59:01Z) - Towards Implicit Text-Guided 3D Shape Generation [81.22491096132507]
This work explores the challenging task of generating 3D shapes from text.
We propose a new approach for text-guided 3D shape generation, capable of producing high-fidelity shapes with colors that match the given text description.
arXiv Detail & Related papers (2022-03-28T10:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.