Pushing the Limits of 3D Shape Generation at Scale
- URL: http://arxiv.org/abs/2306.11510v2
- Date: Sat, 19 Aug 2023 12:01:19 GMT
- Title: Pushing the Limits of 3D Shape Generation at Scale
- Authors: Yu Wang, Xuelin Qian, Jingyang Huo, Tiejun Huang, Bo Zhao, Yanwei Fu
- Abstract summary: We present a significant breakthrough in 3D shape generation by scaling it to unprecedented dimensions.
We have developed a model with an astounding 3.6 billion trainable parameters, establishing it as the largest 3D shape generation model to date, named Argus-3D.
- Score: 65.24420181727615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a significant breakthrough in 3D shape generation by scaling it to
unprecedented dimensions. Through the adaptation of the Auto-Regressive model
and the utilization of large language models, we have developed a remarkable
model with an astounding 3.6 billion trainable parameters, establishing it as
the largest 3D shape generation model to date, named Argus-3D. Our approach
addresses the limitations of existing methods by enhancing the quality and
diversity of generated 3D shapes. To tackle the challenges of high-resolution
3D shape generation, our model incorporates tri-plane features as latent
representations, effectively reducing computational complexity. Additionally,
we introduce a discrete codebook for efficient quantization of these
representations. Leveraging the power of transformers, we enable multi-modal
conditional generation, facilitating the production of diverse and visually
impressive 3D shapes. To train our expansive model, we leverage an ensemble of
publicly-available 3D datasets, consisting of a comprehensive collection of
approximately 900,000 objects from renowned repositories such as ModelNet40,
ShapeNet, Pix3D, 3D-Future, and Objaverse. This diverse dataset empowers our
model to learn from a wide range of object variations, bolstering its ability
to generate high-quality and diverse 3D shapes. Extensive experimentation
demonstrate the remarkable efficacy of our approach in significantly improving
the visual quality of generated 3D shapes. By pushing the boundaries of 3D
generation, introducing novel methods for latent representation learning, and
harnessing the power of transformers for multi-modal conditional generation,
our contributions pave the way for substantial advancements in the field. Our
work unlocks new possibilities for applications in gaming, virtual reality,
product design, and other domains that demand high-quality and diverse 3D
objects.
Related papers
- 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion [86.25111098482537]
We introduce 3DTopia-XL, a scalable native 3D generative model designed to overcome limitations of existing methods.
3DTopia-XL leverages a novel primitive-based 3D representation, PrimX, which encodes detailed shape, albedo, and material field into a compact tensorial format.
On top of the novel representation, we propose a generative framework based on Diffusion Transformer (DiT), which comprises 1) Primitive Patch Compression, 2) and Latent Primitive Diffusion.
We conduct extensive qualitative and quantitative experiments to demonstrate that 3DTopia-XL significantly outperforms existing methods in generating high-
arXiv Detail & Related papers (2024-09-19T17:59:06Z) - CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets [43.315487682462845]
CLAY is a 3D geometry and material generator designed to transform human imagination into intricate 3D digital structures.
At its core is a large-scale generative model composed of a multi-resolution Variational Autoencoder (VAE) and a minimalistic latent Diffusion Transformer (DiT)
We demonstrate using CLAY for a range of controllable 3D asset creations, from sketchy conceptual designs to production ready assets with intricate details.
arXiv Detail & Related papers (2024-05-30T05:57:36Z) - DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation [53.20147419879056]
We introduce a diffusion-based feed-forward framework to address challenges with a single model.
Building upon our 3D-aware Diffusion model with TransFormer, we propose a stronger version for 3D generation, i.e., DiffTF++.
Experiments on ShapeNet and OmniObject3D convincingly demonstrate the effectiveness of our proposed modules.
arXiv Detail & Related papers (2024-05-13T17:59:51Z) - LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation [73.36690511083894]
This paper introduces a novel framework called LN3Diff to address a unified 3D diffusion pipeline.
Our approach harnesses a 3D-aware architecture and variational autoencoder to encode the input image into a structured, compact, and 3D latent space.
It achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation.
arXiv Detail & Related papers (2024-03-18T17:54:34Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - Large-Vocabulary 3D Diffusion Model with Transformer [57.076986347047]
We introduce a diffusion-based feed-forward framework for synthesizing massive categories of real-world 3D objects with a single generative model.
We propose a novel triplane-based 3D-aware Diffusion model with TransFormer, DiffTF, for handling challenges via three aspects.
Experiments on ShapeNet and OmniObject3D convincingly demonstrate that a single DiffTF model achieves state-of-the-art large-vocabulary 3D object generation performance.
arXiv Detail & Related papers (2023-09-14T17:59:53Z) - 3D Neural Field Generation using Triplane Diffusion [37.46688195622667]
We present an efficient diffusion-based model for 3D-aware generation of neural fields.
Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields.
We demonstrate state-of-the-art results on 3D generation on several object classes from ShapeNet.
arXiv Detail & Related papers (2022-11-30T01:55:52Z) - Learning to Generate 3D Shapes from a Single Example [28.707149807472685]
We present a multi-scale GAN-based model designed to capture the input shape's geometric features across a range of spatial scales.
We train our generative model on a voxel pyramid of the reference shape, without the need of any external supervision or manual annotation.
The resulting shapes present variations across different scales, and at the same time retain the global structure of the reference shape.
arXiv Detail & Related papers (2022-08-05T01:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.