ZeroForge: Feedforward Text-to-Shape Without 3D Supervision
- URL: http://arxiv.org/abs/2306.08183v2
- Date: Fri, 16 Jun 2023 00:48:13 GMT
- Title: ZeroForge: Feedforward Text-to-Shape Without 3D Supervision
- Authors: Kelly O. Marshall, Minh Pham, Ameya Joshi, Anushrut Jignasu, Aditya
Balu, Adarsh Krishnamurthy, Chinmay Hegde
- Abstract summary: We present ZeroForge, an approach for zero-shot text-to-shape generation that avoids both pitfalls.
To achieve open-vocabulary shape generation, we require careful architectural adaptation of existing feed-forward approaches.
- Score: 24.558721379714694
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current state-of-the-art methods for text-to-shape generation either require
supervised training using a labeled dataset of pre-defined 3D shapes, or
perform expensive inference-time optimization of implicit neural
representations. In this work, we present ZeroForge, an approach for zero-shot
text-to-shape generation that avoids both pitfalls. To achieve open-vocabulary
shape generation, we require careful architectural adaptation of existing
feed-forward approaches, as well as a combination of data-free CLIP-loss and
contrastive losses to avoid mode collapse. Using these techniques, we are able
to considerably expand the generative ability of existing feed-forward
text-to-shape models such as CLIP-Forge. We support our method via extensive
qualitative and quantitative evaluations
Related papers
- Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds [6.69660410213287]
We propose an innovative framework called Point-MGE to explore the benefits of deeply integrating 3D representation learning and generative learning.
In shape classification, Point-MGE achieved an accuracy of 94.2% (+1.0%) on the ModelNet40 dataset and 92.9% (+5.5%) on the ScanObjectNN dataset.
Experimental results also confirmed that Point-MGE can generate high-quality 3D shapes in both unconditional and conditional settings.
arXiv Detail & Related papers (2024-06-25T07:57:03Z) - Make-A-Shape: a Ten-Million-scale 3D Shape Model [52.701745578415796]
This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale.
We first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme.
We derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients.
arXiv Detail & Related papers (2024-01-20T00:21:58Z) - EXIM: A Hybrid Explicit-Implicit Representation for Text-Guided 3D Shape
Generation [124.27302003578903]
This paper presents a new text-guided technique for generating 3D shapes.
We leverage a hybrid 3D representation, namely EXIM, combining the strengths of explicit and implicit representations.
We demonstrate the applicability of our approach to generate indoor scenes with consistent styles using text-induced 3D shapes.
arXiv Detail & Related papers (2023-11-03T05:01:51Z) - Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space.
We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z) - DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation [105.97545053660619]
We present a new text-guided 3D shape generation approach DreamStone.
It uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data.
Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity.
arXiv Detail & Related papers (2023-03-24T03:56:23Z) - Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors [79.80916315953374]
We propose SSP3D, a semi-supervised framework for 3D reconstruction.
We introduce an attention-guided prototype shape prior module for guiding realistic object reconstruction.
Our approach also performs well when transferring to real-world Pix3D datasets under labeling ratios of 10%.
arXiv Detail & Related papers (2022-09-30T11:19:25Z) - ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation [91.37036638939622]
This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities.
Our key contribution is a two-stage feature-space-alignment approach that maps CLIP features to shapes.
We formulate a text-guided shape stylization module to dress up the output shapes with novel textures.
arXiv Detail & Related papers (2022-09-09T06:54:21Z) - CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation [16.59461081771521]
We present a simple yet effective method for zero-shot text-to-shape generation based on a two-stage training process.
Our method not only demonstrates promising zero-shot generalization, but also avoids expensive inference time optimization.
arXiv Detail & Related papers (2021-10-06T09:55:19Z) - DEF: Deep Estimation of Sharp Geometric Features in 3D Shapes [43.853000396885626]
We propose a learning-based framework for predicting sharp geometric features in sampled 3D shapes.
By fusing the result of individual patches, we can process large 3D models, which are impossible to process for existing data-driven methods.
arXiv Detail & Related papers (2020-11-30T18:21:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.