Related papers: 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

URL: http://arxiv.org/abs/2409.12957v1
Date: Thu, 19 Sep 2024 17:59:06 GMT
Title: 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
Authors: Zhaoxi Chen, Jiaxiang Tang, Yuhao Dong, Ziang Cao, Fangzhou Hong, Yushi Lan, Tengfei Wang, Haozhe Xie, Tong Wu, Shunsuke Saito, Liang Pan, Dahua Lin, Ziwei Liu,
Abstract summary: We introduce 3DTopia-XL, a scalable native 3D generative model designed to overcome limitations of existing methods. 3DTopia-XL leverages a novel primitive-based 3D representation, PrimX, which encodes detailed shape, albedo, and material field into a compact tensorial format. On top of the novel representation, we propose a generative framework based on Diffusion Transformer (DiT), which comprises 1) Primitive Patch Compression, 2) and Latent Primitive Diffusion. We conduct extensive qualitative and quantitative experiments to demonstrate that 3DTopia-XL significantly outperforms existing methods in generating high-
Score: 86.25111098482537
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing demand for high-quality 3D assets across various industries necessitates efficient and automated 3D content creation. Despite recent advancements in 3D generative models, existing methods still face challenges with optimization speed, geometric fidelity, and the lack of assets for physically based rendering (PBR). In this paper, we introduce 3DTopia-XL, a scalable native 3D generative model designed to overcome these limitations. 3DTopia-XL leverages a novel primitive-based 3D representation, PrimX, which encodes detailed shape, albedo, and material field into a compact tensorial format, facilitating the modeling of high-resolution geometry with PBR assets. On top of the novel representation, we propose a generative framework based on Diffusion Transformer (DiT), which comprises 1) Primitive Patch Compression, 2) and Latent Primitive Diffusion. 3DTopia-XL learns to generate high-quality 3D assets from textual or visual inputs. We conduct extensive qualitative and quantitative experiments to demonstrate that 3DTopia-XL significantly outperforms existing methods in generating high-quality 3D assets with fine-grained textures and materials, efficiently bridging the quality gap between generative models and real-world applications.

Related papers

Large Material Gaussian Model for Relightable 3D Generation [54.10879517395551]
We introduce a novel framework designed to generate high-quality 3D content with Physically Based Rendering (PBR) materials.<n>Our method not only exhibit greater visual appeal compared to baseline methods but also enhance material modeling, thereby enabling practical downstream rendering applications.
arXiv Detail & Related papers (2025-09-26T09:35:12Z)
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets [90.99212668875971]
Step1X-3D is an open framework addressing challenges such as data scarcity, algorithmic limitations, and ecosystem fragmentation.<n>We present a two-stage 3D-native architecture combining a hybrid VAE-DiT geometry generator with a diffusion-based texture synthesis module.<n> Benchmark results demonstrate state-of-the-art performance that exceeds existing open-source methods.
arXiv Detail & Related papers (2025-05-12T16:56:30Z)
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models [69.0220314849478]
TripoSG is a new streamlined shape diffusion paradigm capable of generating high-fidelity 3D meshes with precise correspondence to input images. The resulting 3D shapes exhibit enhanced detail due to high-resolution capabilities and demonstrate exceptional fidelity to input images. To foster progress and innovation in the field of 3D generation, we will make our model publicly available.
arXiv Detail & Related papers (2025-02-10T16:07:54Z)
GraphicsDreamer: Image to 3D Generation with Physical Consistency [32.26851174969898]
We introduce GraphicsDreamer, a method for creating highly usable 3D meshes from single images. In the geometry fusion stage, we continue to enforce the PBR constraints, ensuring that the generated 3D objects possess reliable texture details. Our method incorporates topology optimization and fast UV unwrapping capabilities, allowing the 3D products to be seamlessly imported into graphics engines.
arXiv Detail & Related papers (2024-12-18T10:01:27Z)
Structured 3D Latents for Scalable and Versatile 3D Generation [28.672494137267837]
We introduce a novel 3D generation method for versatile and high-quality 3D asset creation. The cornerstone is a unified Structured LATent representation which allows decoding to different output formats. This is achieved by integrating a sparsely-populated 3D grid with dense multiview visual features extracted from a powerful vision foundation model.
arXiv Detail & Related papers (2024-12-02T13:58:38Z)
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z)
ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting [30.99112626706754]
The creation of high-quality 3D assets is paramount for applications in digital heritage, entertainment, and robotics. Traditionally, this process necessitates skilled professionals and specialized software for modeling. We introduce a novel 3D content creation framework, which generates 3D textures efficiently.
arXiv Detail & Related papers (2024-07-26T18:26:01Z)
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets [43.315487682462845]
CLAY is a 3D geometry and material generator designed to transform human imagination into intricate 3D digital structures. At its core is a large-scale generative model composed of a multi-resolution Variational Autoencoder (VAE) and a minimalistic latent Diffusion Transformer (DiT) We demonstrate using CLAY for a range of controllable 3D asset creations, from sketchy conceptual designs to production ready assets with intricate details.
arXiv Detail & Related papers (2024-05-30T05:57:36Z)
Compress3D: a Compressed Latent Space for 3D Generation from a Single Image [27.53099431097921]
Triplane autoencoder encodes 3D models into a compact triplane latent space to compress both the 3D geometry and texture information. We introduce a 3D-aware cross-attention mechanism, which utilizes low-resolution latent representations to query features from a high-resolution 3D feature volume. Our approach enables the generation of high-quality 3D assets in merely 7 seconds on a single A100 GPU.
arXiv Detail & Related papers (2024-03-20T11:51:04Z)
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation [73.36690511083894]
This paper introduces a novel framework called LN3Diff to address a unified 3D diffusion pipeline. Our approach harnesses a 3D-aware architecture and variational autoencoder to encode the input image into a structured, compact, and 3D latent space. It achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation.
arXiv Detail & Related papers (2024-03-18T17:54:34Z)
Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space. We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z)
Breathing New Life into 3D Assets with Generative Repainting [74.80184575267106]
Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators. Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields. We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools. Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, and orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools.
arXiv Detail & Related papers (2023-09-15T16:34:51Z)
Pushing the Limits of 3D Shape Generation at Scale [65.24420181727615]
We present a significant breakthrough in 3D shape generation by scaling it to unprecedented dimensions. We have developed a model with an astounding 3.6 billion trainable parameters, establishing it as the largest 3D shape generation model to date, named Argus-3D.
arXiv Detail & Related papers (2023-06-20T13:01:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.