Related papers: VP-VAE: Rethinking Vector Quantization via Adaptive Vector Perturbation

VP-VAE: Rethinking Vector Quantization via Adaptive Vector Perturbation

URL: http://arxiv.org/abs/2602.17133v1
Date: Thu, 19 Feb 2026 07:12:43 GMT
Title: VP-VAE: Rethinking Vector Quantization via Adaptive Vector Perturbation
Authors: Linwei Zhai, Han Ding, Mingzhi Lin, Cui Zhao, Fei Wang, Ge Wang, Wang Zhi, Wei Xi,
Abstract summary: Vector Quantized Variational Autoencoders (VQ-VAEs) are fundamental to modern generative modeling, yet they often suffer from training instability and "codebook collapse"<n>In this paper, we propose a novel paradigm that decouples representation learning from discretization by eliminating the need for an explicit codebook during training.
Score: 16.334397444253266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vector Quantized Variational Autoencoders (VQ-VAEs) are fundamental to modern generative modeling, yet they often suffer from training instability and "codebook collapse" due to the inherent coupling of representation learning and discrete codebook optimization. In this paper, we propose VP-VAE (Vector Perturbation VAE), a novel paradigm that decouples representation learning from discretization by eliminating the need for an explicit codebook during training. Our key insight is that, from the neural network's viewpoint, performing quantization primarily manifests as injecting a structured perturbation in latent space. Accordingly, VP-VAE replaces the non-differentiable quantizer with distribution-consistent and scale-adaptive latent perturbations generated via Metropolis--Hastings sampling. This design enables stable training without a codebook while making the model robust to inference-time quantization error. Moreover, under the assumption of approximately uniform latent variables, we derive FSP (Finite Scalar Perturbation), a lightweight variant of VP-VAE that provides a unified theoretical explanation and a practical improvement for FSQ-style fixed quantizers. Extensive experiments on image and audio benchmarks demonstrate that VP-VAE and FSP improve reconstruction fidelity and achieve substantially more balanced token usage, while avoiding the instability inherent to coupled codebook training.

Related papers

VAE-REPA: Variational Autoencoder Representation Alignment for Efficient Diffusion Training [53.09658039757408]
This paper proposes textbfnamex, a lightweight intrinsic guidance framework for efficient diffusion training.<n>name aligns the intermediate latent features of diffusion transformers with VAE features via a lightweight projection layer, supervised by a feature alignment loss.<n>Experiments demonstrate that name improves both generation quality and training convergence speed compared to vanilla diffusion transformers.
arXiv Detail & Related papers (2026-01-25T13:22:38Z)
Image Tokenizer Needs Post-Training [76.91832192778732]
We propose a novel tokenizer training scheme, focusing on improving latent space construction and decoding respectively.<n>Specifically, we propose a plug-and-play tokenizer training scheme, which significantly enhances the robustness of tokenizer.<n>We further optimize the tokenizer decoder regarding a well-trained generative model to mitigate the distribution difference between generated and reconstructed tokens.
arXiv Detail & Related papers (2025-09-15T21:38:03Z)
Enhancing Variational Autoencoders with Smooth Robust Latent Encoding [54.74721202894622]
Variational Autoencoders (VAEs) have played a key role in scaling up diffusion-based generative models.<n>We introduce Smooth Robust Latent VAE, a novel adversarial training framework that boosts both generation quality and robustness.<n>Experiments show that SRL-VAE improves both generation quality, in image reconstruction and text-guided image editing, and robustness, against Nightshade attacks and image editing attacks.
arXiv Detail & Related papers (2025-04-24T03:17:57Z)
Gaussian Mixture Vector Quantization with Aggregated Categorical Posterior [5.862123282894087]
We introduce the Vector Quantized Variational Autoencoder (VQ-VAE) VQ-VAE is a type of variational autoencoder using discrete embedding as latent. We show that GM-VQ improves codebook utilization and reduces information loss without relying on handcrafteds.
arXiv Detail & Related papers (2024-10-14T05:58:11Z)
PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z)
HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes [18.57499609338579]
We propose a novel framework to learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE) HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE, such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE) Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance.
arXiv Detail & Related papers (2023-12-31T01:39:38Z)
How to train your VAE [0.0]
Variational Autoencoders (VAEs) have become a cornerstone in generative modeling and representation learning within machine learning. This paper explores interpreting the Kullback-Leibler (KL) Divergence, a critical component within the Evidence Lower Bound (ELBO) The proposed method redefines the ELBO with a mixture of Gaussians for the posterior probability, introduces a regularization term, and employs a PatchGAN discriminator to enhance texture realism.
arXiv Detail & Related papers (2023-09-22T19:52:28Z)
Vector Quantized Wasserstein Auto-Encoder [57.29764749855623]
We study learning deep discrete representations from the generative viewpoint. We endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution. We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution.
arXiv Detail & Related papers (2023-02-12T13:51:36Z)
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization [13.075574481614478]
One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook. We propose a new training scheme that extends the standard VAE via novel dequantization and quantization. Our experiments show that SQ-VAE improves codebook utilization without using commons.
arXiv Detail & Related papers (2022-05-16T09:49:37Z)
Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization [76.68866368409216]
We propose learning to dynamically select discretization tightness conditioned on inputs. We show that dynamically varying tightness in communication bottlenecks can improve model performance on visual reasoning and reinforcement learning tasks.
arXiv Detail & Related papers (2022-02-02T23:54:26Z)
Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency. We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)
Robust Training of Vector Quantized Bottleneck Models [21.540133031071438]
We demonstrate methods for reliable and efficient training of discrete representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs) For unsupervised representation learning, they became viable alternatives to continuous latent variable models such as the Variational Auto-Encoder (VAE)
arXiv Detail & Related papers (2020-05-18T08:23:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.