Related papers: Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization

Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization

URL: http://arxiv.org/abs/2504.02817v1
Date: Thu, 03 Apr 2025 17:57:52 GMT
Title: Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
Authors: Kangle Deng, Hsueh-Ti Derek Liu, Yiheng Zhu, Xiaoxia Sun, Chong Shang, Kiran Bhat, Deva Ramanan, Jun-Yan Zhu, Maneesh Agrawala, Tinghui Zhou,
Abstract summary: Existing methods encode all shapes into a fixed-size token, disregarding the inherent variations in scale and complexity across 3D data.<n>We introduce Octree-based Adaptive Tokenization, a novel framework that adjusts the dimension of latent representations according to shape complexity.<n>Our approach reduces token counts by 50% compared to fixed-size methods while maintaining comparable visual quality.
Score: 68.07464514094299
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many 3D generative models rely on variational autoencoders (VAEs) to learn compact shape representations. However, existing methods encode all shapes into a fixed-size token, disregarding the inherent variations in scale and complexity across 3D data. This leads to inefficient latent representations that can compromise downstream generation. We address this challenge by introducing Octree-based Adaptive Tokenization, a novel framework that adjusts the dimension of latent representations according to shape complexity. Our approach constructs an adaptive octree structure guided by a quadric-error-based subdivision criterion and allocates a shape latent vector to each octree cell using a query-based transformer. Building upon this tokenization, we develop an octree-based autoregressive generative model that effectively leverages these variable-sized representations in shape generation. Extensive experiments demonstrate that our approach reduces token counts by 50% compared to fixed-size methods while maintaining comparable visual quality. When using a similar token length, our method produces significantly higher-quality shapes. When incorporated with our downstream generative model, our method creates more detailed and diverse 3D content than existing approaches.

Related papers

OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation [24.980804600194062]
OctGPT is a novel multiscale autoregressive model for 3D shape generation. It dramatically improves the efficiency and performance of prior 3D autoregressive approaches. It offers a new paradigm for high-quality, scalable 3D content creation.
arXiv Detail & Related papers (2025-04-14T08:31:26Z)
A Mesh Is Worth 512 Numbers: Spectral-domain Diffusion Modeling for High-dimension Shape Generation [4.064004858393506]
This paper introduces a novel framework, spectral-domain diffusion for high-quality shape generation SpoDify.<n>It efficiently encodes complex meshes into continuous implicit representations, such as encoding a 15k-vertex mesh to a 512-dimensional latent code without learning.
arXiv Detail & Related papers (2025-03-09T07:05:29Z)
DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow [44.72037991063735]
DetailGen3D is a generative approach specifically designed to enhance generated 3D shapes.<n>Our key insight is to model the coarse-to-fine transformation directly through data-dependent flows in latent space.<n>We introduce a token matching strategy that ensures accurate spatial correspondence during refinement.
arXiv Detail & Related papers (2024-11-25T17:08:17Z)
Make-A-Shape: a Ten-Million-scale 3D Shape Model [52.701745578415796]
This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale. We first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme. We derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients.
arXiv Detail & Related papers (2024-01-20T00:21:58Z)
Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space. We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z)
Dual Octree Graph Networks for Learning Adaptive Volumetric Shape Representations [21.59311861556396]
Our method encodes the volumetric field of a 3D shape with an adaptive feature volume organized by an octree. An encoder-decoder network is designed to learn the adaptive feature volume based on the graph convolutions over the dual graph of octree nodes. Our method effectively encodes shape details, enables fast 3D shape reconstruction, and exhibits good generality for modeling 3D shapes out of training categories.
arXiv Detail & Related papers (2022-05-05T17:56:34Z)
Autoregressive 3D Shape Generation via Canonical Mapping [92.91282602339398]
transformers have shown remarkable performances in a variety of generative tasks such as image, audio, and text generation. In this paper, we aim to further exploit the power of transformers and employ them for the task of 3D point cloud generation. Our model can be easily extended to multi-modal shape completion as an application for conditional shape generation.
arXiv Detail & Related papers (2022-04-05T03:12:29Z)
Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically Structured Sequences [11.09257948735229]
Autoregressive models have proven to be very powerful in NLP text generation tasks. We introduce an adaptive compression scheme, that significantly reduces sequence lengths. We demonstrate the performance of our model by comparing against the state-of-the-art in shape generation.
arXiv Detail & Related papers (2021-11-24T13:17:16Z)
Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution. We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids. The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z)
Dense Non-Rigid Structure from Motion: A Manifold Viewpoint [162.88686222340962]
Non-Rigid Structure-from-Motion (NRSfM) problem aims to recover 3D geometry of a deforming object from its 2D feature correspondences across multiple frames. We show that our approach significantly improves accuracy, scalability, and robustness against noise.
arXiv Detail & Related papers (2020-06-15T09:15:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.