Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization
- URL: http://arxiv.org/abs/2509.10140v1
- Date: Fri, 12 Sep 2025 11:08:21 GMT
- Title: Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization
- Authors: Yifan Chang, Jie Qin, Limeng Qiao, Xiaofeng Wang, Zheng Zhu, Lin Ma, Xingang Wang,
- Abstract summary: Vector quantization (VQ) is a key component in discrete tokenizers for image generation.<n>VQBridge is a robust, scalable, and efficient projector based on the map function method.<n>FVQ attains 100% codebook usage even with a 262k-codebook.
- Score: 60.294965457786844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vector quantization (VQ) is a key component in discrete tokenizers for image generation, but its training is often unstable due to straight-through estimation bias, one-step-behind updates, and sparse codebook gradients, which lead to suboptimal reconstruction performance and low codebook usage. In this work, we analyze these fundamental challenges and provide a simple yet effective solution. To maintain high codebook usage in VQ networks (VQN) during learning annealing and codebook size expansion, we propose VQBridge, a robust, scalable, and efficient projector based on the map function method. VQBridge optimizes code vectors through a compress-process-recover pipeline, enabling stable and effective codebook training. By combining VQBridge with learning annealing, our VQN achieves full (100%) codebook usage across diverse codebook configurations, which we refer to as FVQ (FullVQ). Through extensive experiments, we demonstrate that FVQ is effective, scalable, and generalizable: it attains 100% codebook usage even with a 262k-codebook, achieves state-of-the-art reconstruction performance, consistently improves with larger codebooks, higher vector channels, or longer training, and remains effective across different VQ variants. Moreover, when integrated with LlamaGen, FVQ significantly enhances image generation performance, surpassing visual autoregressive models (VAR) by 0.5 and diffusion models (DiT) by 0.2 rFID, highlighting the importance of high-quality tokenizers for strong autoregressive image generation.
Related papers
- VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction [83.50898344094153]
VQRAE produces Continuous semantic features for image understanding and tokens for visual generation within a unified tokenizer.<n>Design enables negligible semantic information for maintaining the ability of multimodal understanding, discrete tokens.<n>VQRAE presents competitive performance on several benchmarks of visual understanding, generation and reconstruction.
arXiv Detail & Related papers (2025-11-28T17:26:34Z) - Group-Wise Optimization for Self-Extensible Codebooks in Vector Quantized Models [22.7968403903992]
VQ-VAEs leverage self-supervised learning to represent continuous vectors using the closest vectors in a codebook.<n>Existing approaches employ implicit static codebooks or jointly optimize the entire codebook, but these methods constrain the codebook's learning capability.<n>We propose Group-VQ, which performs group-wise optimization on the codebook.
arXiv Detail & Related papers (2025-10-15T09:14:22Z) - MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization [35.57897644198773]
We propose MGVQ, a novel method to augment the representation capability of discrete codebooks.<n> MGVQ achieves the state-of-the-art performance on both ImageNet and 8 zero-shot benchmarks.<n>Results highlight the superiority of MGVQ in reconstruction and pave the way for preserving fidelity in HD image processing tasks.
arXiv Detail & Related papers (2025-07-10T17:59:54Z) - Scalable Image Tokenization with Index Backpropagation Quantization [74.15447383432262]
Index Backpropagation Quantization (IBQ) is a new VQ method for the joint optimization of all codebook embeddings and the visual encoder.<n>IBQ enables scalable training of visual tokenizers and, for the first time, achieves a large-scale codebook with high dimension ($256$) and high utilization.
arXiv Detail & Related papers (2024-12-03T18:59:10Z) - XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation [54.2574228021317]
We present XQ-GAN, an image tokenization framework designed for both image reconstruction and generation tasks.<n>Our framework integrates state-of-the-art quantization techniques, including vector quantization (VQ), residual quantization (RQ), multi-scale residual quantization (MSVQ), product quantization (PQ), and binary spherical quantization (BSQ)<n>On the standard ImageNet 256x256 benchmark, our released model achieves an rFID of 0.64, significantly surpassing MAGVIT-v2 (0.9 rFID) and VAR (0.9 rFID)
arXiv Detail & Related papers (2024-12-02T17:58:06Z) - Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% [35.710953589794855]
We propose a novel image quantization model named VQGAN-LC (Large Codebook), which extends the codebook size to 100,000, achieving an utilization rate exceeding 99%.
We demonstrate the superior performance of our model over its counterparts across a variety of tasks, including image reconstruction, image classification, auto-regressive image generation using GPT, and image creation with diffusion- and flow-based generative models.
arXiv Detail & Related papers (2024-06-17T17:59:57Z) - HyperVQ: MLR-based Vector Quantization in Hyperbolic Space [56.4245885674567]
A common solution is to employ Vector Quantization (VQ) within VQ Variational Autoencoders (VQVAEs)<n>We introduce HyperVQ, a novel approach that formulates VQ as a hyperbolic Multinomial Logistic Regression (MLR) problem.<n>Our experiments demonstrate that HyperVQ matches traditional VQ in generative and reconstruction tasks, while surpassing it in discriminative performance.
arXiv Detail & Related papers (2024-03-18T03:17:08Z) - Online Clustered Codebook [100.1650001618827]
We present a simple alternative method for online codebook learning, Clustering VQ-VAE (CVQ-VAE)
Our approach selects encoded features as anchors to update the dead'' codevectors, while optimising the codebooks which are alive via the original loss.
Our CVQ-VAE can be easily integrated into the existing models with just a few lines of code.
arXiv Detail & Related papers (2023-07-27T18:31:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.