OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation
- URL: http://arxiv.org/abs/2504.09975v2
- Date: Tue, 15 Apr 2025 08:40:13 GMT
- Title: OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation
- Authors: Si-Tong Wei, Rui-Huan Wang, Chuan-Zhi Zhou, Baoquan Chen, Peng-Shuai Wang,
- Abstract summary: OctGPT is a novel multiscale autoregressive model for 3D shape generation.<n>It dramatically improves the efficiency and performance of prior 3D autoregressive approaches.<n>It offers a new paradigm for high-quality, scalable 3D content creation.
- Score: 24.980804600194062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autoregressive models have achieved remarkable success across various domains, yet their performance in 3D shape generation lags significantly behind that of diffusion models. In this paper, we introduce OctGPT, a novel multiscale autoregressive model for 3D shape generation that dramatically improves the efficiency and performance of prior 3D autoregressive approaches, while rivaling or surpassing state-of-the-art diffusion models. Our method employs a serialized octree representation to efficiently capture the hierarchical and spatial structures of 3D shapes. Coarse geometry is encoded via octree structures, while fine-grained details are represented by binary tokens generated using a vector quantized variational autoencoder (VQVAE), transforming 3D shapes into compact multiscale binary sequences suitable for autoregressive prediction. To address the computational challenges of handling long sequences, we incorporate octree-based transformers enhanced with 3D rotary positional encodings, scale-specific embeddings, and token-parallel generation schemes. These innovations reduce training time by 13 folds and generation time by 69 folds, enabling the efficient training of high-resolution 3D shapes, e.g.,$1024^3$, on just four NVIDIA 4090 GPUs only within days. OctGPT showcases exceptional versatility across various tasks, including text-, sketch-, and image-conditioned generation, as well as scene-level synthesis involving multiple objects. Extensive experiments demonstrate that OctGPT accelerates convergence and improves generation quality over prior autoregressive methods, offering a new paradigm for high-quality, scalable 3D content creation. Our code and trained models are available at https://github.com/octree-nn/octgpt.
Related papers
- Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization [68.07464514094299]
Existing methods encode all shapes into a fixed-size token, disregarding the inherent variations in scale and complexity across 3D data.
We introduce Octree-based Adaptive Tokenization, a novel framework that adjusts the dimension of latent representations according to shape complexity.
Our approach reduces token counts by 50% compared to fixed-size methods while maintaining comparable visual quality.
arXiv Detail & Related papers (2025-04-03T17:57:52Z) - Structured 3D Latents for Scalable and Versatile 3D Generation [28.672494137267837]
We introduce a novel 3D generation method for versatile and high-quality 3D asset creation.<n>The cornerstone is a unified Structured LATent representation which allows decoding to different output formats.<n>This is achieved by integrating a sparsely-populated 3D grid with dense multiview visual features extracted from a powerful vision foundation model.
arXiv Detail & Related papers (2024-12-02T13:58:38Z) - 3D-WAG: Hierarchical Wavelet-Guided Autoregressive Generation for High-Fidelity 3D Shapes [20.675695749508353]
We introduce 3D-WAG, an AR model for 3D implicit distance fields that can perform unconditional shape generation.<n>By redefining 3D AR generation task as next-scale" prediction, we reduce the computational cost of generation.<n>Our results show 3D-WAG achieves superior performance in key metrics like Coverage and MMD.
arXiv Detail & Related papers (2024-11-28T10:33:01Z) - G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer [4.221298212125194]
We introduce G3PT, a scalable coarse-to-fine 3D generative model utilizing a cross-scale querying transformer.
Cross-scale querying transformer connects tokens globally across different levels of detail without requiring an ordered sequence.
Experiments demonstrate that G3PT achieves superior generation quality and generalization ability compared to previous 3D generation methods.
arXiv Detail & Related papers (2024-09-10T08:27:19Z) - MeshXL: Neural Coordinate Field for Generative 3D Foundation Models [51.1972329762843]
We present a family of generative pre-trained auto-regressive models, which addresses the process of 3D mesh generation with modern large language model approaches.
MeshXL is able to generate high-quality 3D meshes, and can also serve as foundation models for various down-stream applications.
arXiv Detail & Related papers (2024-05-31T14:35:35Z) - DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation [53.20147419879056]
We introduce a diffusion-based feed-forward framework to address challenges with a single model.
Building upon our 3D-aware Diffusion model with TransFormer, we propose a stronger version for 3D generation, i.e., DiffTF++.
Experiments on ShapeNet and OmniObject3D convincingly demonstrate the effectiveness of our proposed modules.
arXiv Detail & Related papers (2024-05-13T17:59:51Z) - LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation [73.36690511083894]
This paper introduces a novel framework called LN3Diff to address a unified 3D diffusion pipeline.
Our approach harnesses a 3D-aware architecture and variational autoencoder to encode the input image into a structured, compact, and 3D latent space.
It achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation.
arXiv Detail & Related papers (2024-03-18T17:54:34Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - Make-A-Shape: a Ten-Million-scale 3D Shape Model [52.701745578415796]
This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale.
We first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme.
We derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients.
arXiv Detail & Related papers (2024-01-20T00:21:58Z) - Dual Octree Graph Networks for Learning Adaptive Volumetric Shape
Representations [21.59311861556396]
Our method encodes the volumetric field of a 3D shape with an adaptive feature volume organized by an octree.
An encoder-decoder network is designed to learn the adaptive feature volume based on the graph convolutions over the dual graph of octree nodes.
Our method effectively encodes shape details, enables fast 3D shape reconstruction, and exhibits good generality for modeling 3D shapes out of training categories.
arXiv Detail & Related papers (2022-05-05T17:56:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.