Related papers: Enhancing Vector Quantization with Distributional Matching: A Theoretical and Empirical Study

Enhancing Vector Quantization with Distributional Matching: A Theoretical and Empirical Study

URL: http://arxiv.org/abs/2506.15078v1
Date: Wed, 18 Jun 2025 02:43:40 GMT
Title: Enhancing Vector Quantization with Distributional Matching: A Theoretical and Empirical Study
Authors: Xianghong Fang, Litao Guo, Hengchao Chen, Yuxuan Zhang, XiaofanXia, Dingjie Song, Yexin Liu, Hao Wang, Harry Yang, Yuan Yuan, Qiang Sun,
Abstract summary: Two critical issues in vector quantization methods are training instability and codebook collapse.<n>We employ the Wasserstein distance to align these two distributions, achieving near 100% codebook utilization.<n>Both empirical and theoretical analyses validate the effectiveness of the proposed approach.
Score: 19.74160064041426
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The success of autoregressive models largely depends on the effectiveness of vector quantization, a technique that discretizes continuous features by mapping them to the nearest code vectors within a learnable codebook. Two critical issues in existing vector quantization methods are training instability and codebook collapse. Training instability arises from the gradient discrepancy introduced by the straight-through estimator, especially in the presence of significant quantization errors, while codebook collapse occurs when only a small subset of code vectors are utilized during training. A closer examination of these issues reveals that they are primarily driven by a mismatch between the distributions of the features and code vectors, leading to unrepresentative code vectors and significant data information loss during compression. To address this, we employ the Wasserstein distance to align these two distributions, achieving near 100\% codebook utilization and significantly reducing the quantization error. Both empirical and theoretical analyses validate the effectiveness of the proposed approach.

Related papers

Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization [0.35534933448684125]
Vector quantization discretizes a continuous vector space into a finite set of representative vectors (a codebook)<n>Smoothed vector quantization addresses this issue by relaxing the hard assignment of a codebook vector into a weighted combination of codebook entries.<n> Experiments on representative benchmarks, including discrete image autoencoding and contrastive speech representation learning, demonstrate that the proposed method achieves more reliable codebook utilization.
arXiv Detail & Related papers (2025-09-26T10:17:42Z)
Quantification via Gaussian Latent Space Representations [3.2198127675295036]
Quantification is the task of predicting the prevalence of each class within an unknown bag of examples.<n>We present an end-to-end neural network that uses Gaussian distributions in latent spaces to obtain invariant representations of bags of examples.
arXiv Detail & Related papers (2025-01-23T13:13:46Z)
Representation Collapsing Problems in Vector Quantization [2.2750239768387255]
Vector quantization is a technique in machine learning that discretizes continuous representations into a set of discrete vectors. Despite its prevalence, the characteristics and behaviors of vector quantization in generative models remain largely underexplored.
arXiv Detail & Related papers (2024-11-25T16:32:29Z)
SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract. We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z)
QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input [17.017127559393398]
We propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation. This enables the network to learn from subtle input perturbations. We further refine the training strategy to ensure convergence while simulating quantization errors.
arXiv Detail & Related papers (2024-05-22T17:34:18Z)
Quantization of Large Language Models with an Overdetermined Basis [73.79368761182998]
We introduce an algorithm for data quantization based on the principles of Kashin representation. Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance.
arXiv Detail & Related papers (2024-04-15T12:38:46Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks [35.6604960300194]
This work examines the challenges of training neural networks using vector quantization using straight-through estimation. We find that a primary cause of training instability is the discrepancy between the model embedding and the code-vector distribution. We identify the factors that contribute to this issue, including the codebook gradient sparsity and the asymmetric nature of the commitment loss.
arXiv Detail & Related papers (2023-05-15T17:56:36Z)
Regularized Vector Quantization for Tokenized Image Synthesis [126.96880843754066]
Quantizing images into discrete representations has been a fundamental problem in unified generative modeling. deterministic quantization suffers from severe codebook collapse and misalignment with inference stage while quantization suffers from low codebook utilization and reconstruction objective. This paper presents a regularized vector quantization framework that allows to mitigate perturbed above issues effectively by applying regularization from two perspectives.
arXiv Detail & Related papers (2023-03-11T15:20:54Z)
On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods. We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z)
Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators [100.58924375509659]
Straight-through (ST) estimator gained popularity due to its simplicity and efficiency. Several techniques were proposed to improve over ST while keeping the same low computational complexity. We conduct a theoretical analysis of Bias and Variance of these methods in order to understand tradeoffs and verify originally claimed properties.
arXiv Detail & Related papers (2021-10-07T15:16:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.