Related papers: CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language

CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language

URL: http://arxiv.org/abs/2211.01427v4
Date: Wed, 24 May 2023 16:04:20 GMT
Title: CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language
Authors: Aditya Sanghi, Rao Fu, Vivian Liu, Karl Willis, Hooman Shayani, Amir Hosein Khasahmadi, Srinath Sridhar, Daniel Ritchie
Abstract summary: We introduce CLIP-Sculptor, a method to produce high-fidelity and diverse 3D shapes without the need for (text, shape) pairs during training. For improved shape diversity, we use a discrete latent space which is modeled using a transformer conditioned on CLIP's image-text embedding space.
Score: 21.727938353786218
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent works have demonstrated that natural language can be used to generate and edit 3D shapes. However, these methods generate shapes with limited fidelity and diversity. We introduce CLIP-Sculptor, a method to address these constraints by producing high-fidelity and diverse 3D shapes without the need for (text, shape) pairs during training. CLIP-Sculptor achieves this in a multi-resolution approach that first generates in a low-dimensional latent space and then upscales to a higher resolution for improved shape fidelity. For improved shape diversity, we use a discrete latent space which is modeled using a transformer conditioned on CLIP's image-text embedding space. We also present a novel variant of classifier-free guidance, which improves the accuracy-diversity trade-off. Finally, we perform extensive experiments demonstrating that CLIP-Sculptor outperforms state-of-the-art baselines. The code is available at https://ivl.cs.brown.edu/#/projects/clip-sculptor.

Related papers

ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion [19.30740914413954]
This paper proposes ShapeShifter, a new 3D generative model that learns to synthesize shape variations based on a single reference model. We show that our resulting variations better capture the fine details of their original input and can handle more general types of surfaces than previous SDF-based methods.
arXiv Detail & Related papers (2025-02-04T10:02:40Z)
NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation [52.772319840580074]
3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation. We introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling.
arXiv Detail & Related papers (2024-03-27T04:09:34Z)
Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space. We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z)
Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D Diffusion? [5.0243930429558885]
This study introduces CISP (Contrastive Image Shape Pre training), designed to enhance 3D shape synthesis guided by 2D images. CISP aims to enrich the CLIP framework by aligning 2D images with 3D shapes in a shared embedding space. We find that, while matching CLIP in generation quality and diversity, CISP substantially improves coherence with input images.
arXiv Detail & Related papers (2024-02-02T09:09:23Z)
EXIM: A Hybrid Explicit-Implicit Representation for Text-Guided 3D Shape Generation [124.27302003578903]
This paper presents a new text-guided technique for generating 3D shapes. We leverage a hybrid 3D representation, namely EXIM, combining the strengths of explicit and implicit representations. We demonstrate the applicability of our approach to generate indoor scenes with consistent styles using text-induced 3D shapes.
arXiv Detail & Related papers (2023-11-03T05:01:51Z)
CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration [53.623649386871016]
This paper presents a new framework that leverages a vision-language model to guide the exploration of the 3D shape space. We propose to leverage CLIP, a powerful pre-trained vision-language model, to aid the shape-space exploration. We design three exploration modes, binary-attribute-guided, text-guided, and sketch-guided, to locate suitable exploration trajectories in shape space and induce meaningful changes to the shape.
arXiv Detail & Related papers (2023-06-14T03:39:32Z)
ShapeClipper: Scalable 3D Shape Learning from Single-View Images via Geometric and CLIP-based Consistency [39.7058456335011]
We present ShapeClipper, a novel method that reconstructs 3D object shapes from real-world single-view RGB images. ShapeClipper learns shape reconstruction from a set of single-view segmented images. We evaluate our method over three challenging real-world datasets.
arXiv Detail & Related papers (2023-04-13T03:53:12Z)
Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space. We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z)
DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation [105.97545053660619]
We present a new text-guided 3D shape generation approach DreamStone. It uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data. Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity.
arXiv Detail & Related papers (2023-03-24T03:56:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.