Related papers: Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically Structured Sequences

Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically Structured Sequences

URL: http://arxiv.org/abs/2111.12480v1
Date: Wed, 24 Nov 2021 13:17:16 GMT
Title: Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically Structured Sequences
Authors: Moritz Ibing, Gregor Kobsik, Leif Kobbelt
Abstract summary: Autoregressive models have proven to be very powerful in NLP text generation tasks. We introduce an adaptive compression scheme, that significantly reduces sequence lengths. We demonstrate the performance of our model by comparing against the state-of-the-art in shape generation.
Score: 11.09257948735229
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autoregressive models have proven to be very powerful in NLP text generation tasks and lately have gained popularity for image generation as well. However, they have seen limited use for the synthesis of 3D shapes so far. This is mainly due to the lack of a straightforward way to linearize 3D data as well as to scaling problems with the length of the resulting sequences when describing complex shapes. In this work we address both of these problems. We use octrees as a compact hierarchical shape representation that can be sequentialized by traversal ordering. Moreover, we introduce an adaptive compression scheme, that significantly reduces sequence lengths and thus enables their effective generation with a transformer, while still allowing fully autoregressive sampling and parallel training. We demonstrate the performance of our model by comparing against the state-of-the-art in shape generation.

Related papers

HiFi-Mesh: High-Fidelity Efficient 3D Mesh Generation via Compact Autoregressive Dependence [36.403921772528236]
We introduce the Latent Autoregressive Network (LANE), which incorporates compact autoregressive dependencies in the generation process.<n>LANE achieves a $6times$ improvement in maximum sequence length compared to existing methods.
arXiv Detail & Related papers (2026-01-29T06:22:26Z)
GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation [77.13582457917418]
We train a generative model solely on grid images comprising subsampled frames.<n>We learn to generate image sequences, using the strong self-attention mechanism of the Diffusion Transformer (DiT) to capture correlations between frames.<n>Our method consistently outperforms SoTA in quality and inference speed (at least twice-as-fast) across datasets.
arXiv Detail & Related papers (2025-12-24T16:46:04Z)
OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation [24.980804600194062]
OctGPT is a novel multiscale autoregressive model for 3D shape generation. It dramatically improves the efficiency and performance of prior 3D autoregressive approaches. It offers a new paradigm for high-quality, scalable 3D content creation.
arXiv Detail & Related papers (2025-04-14T08:31:26Z)
Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization [68.07464514094299]
Existing methods encode all shapes into a fixed-size token, disregarding the inherent variations in scale and complexity across 3D data. We introduce Octree-based Adaptive Tokenization, a novel framework that adjusts the dimension of latent representations according to shape complexity. Our approach reduces token counts by 50% compared to fixed-size methods while maintaining comparable visual quality.
arXiv Detail & Related papers (2025-04-03T17:57:52Z)
G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer [4.221298212125194]
We introduce G3PT, a scalable coarse-to-fine 3D generative model utilizing a cross-scale querying transformer. Cross-scale querying transformer connects tokens globally across different levels of detail without requiring an ordered sequence. Experiments demonstrate that G3PT achieves superior generation quality and generalization ability compared to previous 3D generation methods.
arXiv Detail & Related papers (2024-09-10T08:27:19Z)
VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation [69.68568248073747]
We propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks. PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps. For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details.
arXiv Detail & Related papers (2024-06-21T08:21:52Z)
MeshXL: Neural Coordinate Field for Generative 3D Foundation Models [51.1972329762843]
We present a family of generative pre-trained auto-regressive models, which addresses the process of 3D mesh generation with modern large language model approaches. MeshXL is able to generate high-quality 3D meshes, and can also serve as foundation models for various down-stream applications.
arXiv Detail & Related papers (2024-05-31T14:35:35Z)
NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation [52.772319840580074]
3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation. We introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling.
arXiv Detail & Related papers (2024-03-27T04:09:34Z)
Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers [37.14235383028582]
We introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation.
arXiv Detail & Related papers (2023-12-14T17:18:34Z)
Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space. We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z)
Autoregressive 3D Shape Generation via Canonical Mapping [92.91282602339398]
transformers have shown remarkable performances in a variety of generative tasks such as image, audio, and text generation. In this paper, we aim to further exploit the power of transformers and employ them for the task of 3D point cloud generation. Our model can be easily extended to multi-modal shape completion as an application for conditional shape generation.
arXiv Detail & Related papers (2022-04-05T03:12:29Z)
Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis [90.26556260531707]
DMTet is a conditional generative model that can synthesize high-resolution 3D shapes using simple user guides such as coarse voxels. Unlike deep 3D generative models that directly generate explicit representations such as meshes, our model can synthesize shapes with arbitrary topology.
arXiv Detail & Related papers (2021-11-08T05:29:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.