Related papers: Exploiting Discriminative Codebook Prior for Autoregressive Image Generation

Exploiting Discriminative Codebook Prior for Autoregressive Image Generation

URL: http://arxiv.org/abs/2508.10719v1
Date: Thu, 14 Aug 2025 15:00:00 GMT
Title: Exploiting Discriminative Codebook Prior for Autoregressive Image Generation
Authors: Longxiang Tang, Ruihang Chu, Xiang Wang, Yujin Han, Pingyu Wu, Chunming He, Yingya Zhang, Shiwei Zhang, Jiaya Jia,
Abstract summary: token-based autoregressive image generation systems first tokenize images into sequences of token indices with a codebook, and then model these sequences in an autoregressive paradigm.<n>While autoregressive generative models are trained only on index values, the prior encoded in the codebook, which contains rich token similarity information, is not exploited.<n>Recent studies have attempted to incorporate this prior by performing naive k-means clustering on the tokens, helping to facilitate the training of generative models with a reduced codebook.<n>We propose the Discriminative Codebook Prior Extractor (DCPE) as an alternative to k-means
Score: 54.14166700058777
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Advanced discrete token-based autoregressive image generation systems first tokenize images into sequences of token indices with a codebook, and then model these sequences in an autoregressive paradigm. While autoregressive generative models are trained only on index values, the prior encoded in the codebook, which contains rich token similarity information, is not exploited. Recent studies have attempted to incorporate this prior by performing naive k-means clustering on the tokens, helping to facilitate the training of generative models with a reduced codebook. However, we reveal that k-means clustering performs poorly in the codebook feature space due to inherent issues, including token space disparity and centroid distance inaccuracy. In this work, we propose the Discriminative Codebook Prior Extractor (DCPE) as an alternative to k-means clustering for more effectively mining and utilizing the token similarity information embedded in the codebook. DCPE replaces the commonly used centroid-based distance, which is found to be unsuitable and inaccurate for the token feature space, with a more reasonable instance-based distance. Using an agglomerative merging technique, it further addresses the token space disparity issue by avoiding splitting high-density regions and aggregating low-density ones. Extensive experiments demonstrate that DCPE is plug-and-play and integrates seamlessly with existing codebook prior-based paradigms. With the discriminative prior extracted, DCPE accelerates the training of autoregressive models by 42% on LlamaGen-B and improves final FID and IS performance.

Related papers

Autoregressive Image Generation with Masked Bit Modeling [34.36577356251466]
Bit AutoRegressive modeling (BAR) is a scalable framework that supports arbitrary codebook sizes.<n>BAR achieves a new state-of-the-art gFID of 0.99 on ImageNet-256, outperforming leading methods across both continuous and discrete paradigms.
arXiv Detail & Related papers (2026-02-09T18:59:58Z)
Image Tokenizer Needs Post-Training [76.91832192778732]
We propose a novel tokenizer training scheme, focusing on improving latent space construction and decoding respectively.<n>Specifically, we propose a plug-and-play tokenizer training scheme, which significantly enhances the robustness of tokenizer.<n>We further optimize the tokenizer decoder regarding a well-trained generative model to mitigate the distribution difference between generated and reconstructed tokens.
arXiv Detail & Related papers (2025-09-15T21:38:03Z)
Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit [45.18582668677648]
We present a training-free method to transplant tokenizers in large language models.<n>We approximate each out-of-vocabulary token as a sparse linear combination of shared tokens.<n>We show that OMP achieves best zero-shot preservation of the base model's performance.
arXiv Detail & Related papers (2025-06-07T00:51:27Z)
Scalable Image Tokenization with Index Backpropagation Quantization [74.15447383432262]
Index Backpropagation Quantization (IBQ) is a new VQ method for the joint optimization of all codebook embeddings and the visual encoder.<n>IBQ enables scalable training of visual tokenizers and, for the first time, achieves a large-scale codebook with high dimension ($256$) and high utilization.
arXiv Detail & Related papers (2024-12-03T18:59:10Z)
Prototypical Hash Encoding for On-the-Fly Fine-Grained Category Discovery [65.16724941038052]
Category-aware Prototype Generation (CPG) and Discrimi Category 5.3% (DCE) are proposed.<n>CPG enables the model to fully capture the intra-category diversity by representing each category with multiple prototypes.<n>DCE boosts the discrimination ability of hash code with the guidance of the generated category prototypes.
arXiv Detail & Related papers (2024-10-24T23:51:40Z)
TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction [61.295716741720284]
TokenUnify is a novel pretraining method that integrates random token prediction, next-token prediction, and next-all token prediction. Cooperated with TokenUnify, we have assembled a large-scale electron microscopy (EM) image dataset with ultra-high resolution. This dataset includes over 120 million annotated voxels, making it the largest neuron segmentation dataset to date.
arXiv Detail & Related papers (2024-05-27T05:45:51Z)
GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models. We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network. We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z)
Sparse Attention-Based Neural Networks for Code Classification [15.296053323327312]
We introduce an approach named the Sparse Attention-based neural network for Code Classification (SACC) In the first step, source code undergoes syntax parsing and preprocessing. The encoded sequences of subtrees are fed into a Transformer model that incorporates sparse attention mechanisms for the purpose of classification.
arXiv Detail & Related papers (2023-11-11T14:07:12Z)
EdVAE: Mitigating Codebook Collapse with Evidential Discrete Variational Autoencoders [11.086500036180222]
Codebook collapse is a common problem in training deep generative models with discrete representation spaces. We propose a novel way to incorporate evidential deep learning (EDL) instead of softmax to combat the codebook collapse problem of dVAE.
arXiv Detail & Related papers (2023-10-09T13:39:26Z)
Sparse-Inductive Generative Adversarial Hashing for Nearest Neighbor Search [8.020530603813416]
We propose a novel unsupervised hashing method, termed Sparsity-Induced Generative Adversarial Hashing (SiGAH) SiGAH encodes large-scale high-scale high-dimensional features into binary codes, which solves the two problems through a generative adversarial training framework. Experimental results on four benchmarks, i.e. Tiny100K, GIST1M, Deep1M, and MNIST, have shown that the proposed SiGAH has superior performance over state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-12T08:07:23Z)
Highly Parallel Autoregressive Entity Linking with Discriminative Correction [51.947280241185]
We propose a very efficient approach that parallelizes autoregressive linking across all potential mentions. Our model is >70 times faster and more accurate than the previous generative method.
arXiv Detail & Related papers (2021-09-08T17:28:26Z)
Self-Supervised Bernoulli Autoencoders for Semi-Supervised Hashing [1.8899300124593648]
This paper investigates the robustness of hashing methods based on variational autoencoders to the lack of supervision. We propose a novel supervision method in which the model uses its label distribution predictions to implement the pairwise objective. Our experiments show that both methods can significantly increase the hash codes' quality.
arXiv Detail & Related papers (2020-07-17T07:47:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.