Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models
- URL: http://arxiv.org/abs/2503.08737v1
- Date: Tue, 11 Mar 2025 06:29:39 GMT
- Title: Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models
- Authors: In Cho, Youngbeom Yoo, Subin Jeon, Seon Joo Kim,
- Abstract summary: COD-VAE encodes 3D shapes into a COmpact set of 1D latent vectors without sacrificing quality.<n> COD-VAE achieves 16x compression compared to the baseline while maintaining quality.<n>This enables 20.8x speedup in generation, highlighting that a large number of latent vectors is not a prerequisite for high-quality reconstruction and generation.
- Score: 21.97308739556984
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Constructing a compressed latent space through a variational autoencoder (VAE) is the key for efficient 3D diffusion models. This paper introduces COD-VAE, a VAE that encodes 3D shapes into a COmpact set of 1D latent vectors without sacrificing quality. COD-VAE introduces a two-stage autoencoder scheme to improve compression and decoding efficiency. First, our encoder block progressively compresses point clouds into compact latent vectors via intermediate point patches. Second, our triplane-based decoder reconstructs dense triplanes from latent vectors instead of directly decoding neural fields, significantly reducing computational overhead of neural fields decoding. Finally, we propose uncertainty-guided token pruning, which allocates resources adaptively by skipping computations in simpler regions and improves the decoder efficiency. Experimental results demonstrate that COD-VAE achieves 16x compression compared to the baseline while maintaining quality. This enables 20.8x speedup in generation, highlighting that a large number of latent vectors is not a prerequisite for high-quality reconstruction and generation.
Related papers
- Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization [14.71160140310766]
3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness in 3D reconstruction, achieving high-quality results with real-time radiance field rendering.
However, a key challenge is the substantial storage cost: reconstructing a single scene typically requires millions of Gaussian splats, each represented by 59 floating-point parameters, resulting in approximately 1 GB of memory.
We propose a compression method by building separate attribute codebooks and storing only discrete code indices. Specifically, we employ noise-substituted vector quantization technique to jointly train the codebooks and model features, ensuring consistency between descent gradient optimization and parameter discretization
arXiv Detail & Related papers (2025-04-03T22:19:34Z) - Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models [38.84567900296605]
Deep Compression Autoencoder (DC-AE) is a new family of autoencoder models for accelerating high-resolution diffusion models.<n>Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop.
arXiv Detail & Related papers (2024-10-14T17:15:07Z) - ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model [77.71796503321632]
We introduce a context model in the anchor level for 3DGS representation, yielding an impressive size reduction of over 100 times compared to vanilla 3DGS.
Our work pioneers the context model in the anchor level for 3DGS representation, yielding an impressive size reduction of over 100 times compared to vanilla 3DGS and 15 times compared to the most recent state-of-the-art work Scaffold-GS.
arXiv Detail & Related papers (2024-05-31T09:23:39Z) - CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting [68.94594215660473]
We propose an efficient 3D scene representation, named Compressed Gaussian Splatting (CompGS)
We exploit a small set of anchor primitives for prediction, allowing the majority of primitives to be encapsulated into highly compact residual forms.
Experimental results show that the proposed CompGS significantly outperforms existing methods, achieving superior compactness in 3D scene representation without compromising model accuracy and rendering quality.
arXiv Detail & Related papers (2024-04-15T04:50:39Z) - Fast 2D Bicephalous Convolutional Autoencoder for Compressing 3D Time
Projection Chamber Data [11.186303973102532]
This work introduces two BCAE variants: BCAE++ and BCAE-2D.
BCAE++ achieves a 15% better compression ratio and a 77% better reconstruction accuracy measured in mean absolute error compared with BCAE.
In addition, we demonstrate an unbalanced autoencoder with a larger decoder can improve reconstruction accuracy without significantly sacrificing throughput.
arXiv Detail & Related papers (2023-10-23T15:23:32Z) - Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion.
In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - Implicit Autoencoder for Point-Cloud Self-Supervised Representation
Learning [39.521374237630766]
The most popular and accessible 3D representation, i.e., point clouds, involves discrete samples of the underlying continuous 3D surface.
This discretization process introduces sampling variations on the 3D shape, making it challenging to develop transferable knowledge of the true 3D geometry.
In the standard autoencoding paradigm, the encoder is compelled to encode not only the 3D geometry but also information on the specific discrete sampling of the 3D shape into the latent code.
This is because the point cloud reconstructed by the decoder is considered unacceptable unless there is a perfect mapping between the original and the reconstructed
arXiv Detail & Related papers (2022-01-03T18:05:52Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z) - OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression [77.8842824702423]
We present a novel deep compression algorithm to reduce the memory footprint of LiDAR point clouds.
Our method exploits the sparsity and structural redundancy between points to reduce the memory footprint.
Our algorithm can be used to reduce the onboard and offboard storage of LiDAR points for applications such as self-driving cars.
arXiv Detail & Related papers (2020-05-14T17:48:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.