Online Clustered Codebook
- URL: http://arxiv.org/abs/2307.15139v1
- Date: Thu, 27 Jul 2023 18:31:04 GMT
- Title: Online Clustered Codebook
- Authors: Chuanxia Zheng and Andrea Vedaldi
- Abstract summary: We present a simple alternative method for online codebook learning, Clustering VQ-VAE (CVQ-VAE)
Our approach selects encoded features as anchors to update the dead'' codevectors, while optimising the codebooks which are alive via the original loss.
Our CVQ-VAE can be easily integrated into the existing models with just a few lines of code.
- Score: 100.1650001618827
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vector Quantisation (VQ) is experiencing a comeback in machine learning,
where it is increasingly used in representation learning. However, optimizing
the codevectors in existing VQ-VAE is not entirely trivial. A problem is
codebook collapse, where only a small subset of codevectors receive gradients
useful for their optimisation, whereas a majority of them simply ``dies off''
and is never updated or used. This limits the effectiveness of VQ for learning
larger codebooks in complex computer vision tasks that require high-capacity
representations. In this paper, we present a simple alternative method for
online codebook learning, Clustering VQ-VAE (CVQ-VAE). Our approach selects
encoded features as anchors to update the ``dead'' codevectors, while
optimising the codebooks which are alive via the original loss. This strategy
brings unused codevectors closer in distribution to the encoded features,
increasing the likelihood of being chosen and optimized. We extensively
validate the generalization capability of our quantiser on various datasets,
tasks (e.g. reconstruction and generation), and architectures (e.g. VQ-VAE,
VQGAN, LDM). Our CVQ-VAE can be easily integrated into the existing models with
just a few lines of code.
Related papers
- Addressing Representation Collapse in Vector Quantized Models with One Linear Layer [10.532262196027752]
Vector Quantization (VQ) is a widely used method for converting continuous representations into discrete codes.
VQ models are often hindered by the problem of representation collapse in the latent space.
We propose textbfSimVQ, a novel method which re parameterizes the code vectors through a linear transformation layer based on a learnable latent basis.
arXiv Detail & Related papers (2024-11-04T12:40:18Z) - HyperVQ: MLR-based Vector Quantization in Hyperbolic Space [56.4245885674567]
We study the use of hyperbolic spaces for vector quantization (HyperVQ)
We show that hyperVQ performs comparably in reconstruction and generative tasks while outperforming VQ in discriminative tasks and learning a highly disentangled latent space.
arXiv Detail & Related papers (2024-03-18T03:17:08Z) - Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling [15.132926378740882]
We propose a novel codebook transfer framework with part-of-speech, called VQCT, which aims to transfer a well-trained codebook from pretrained language models to VQIM.
Experimental results on four datasets show that our VQCT method achieves superior VQIM performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2024-03-15T07:24:13Z) - HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes [18.57499609338579]
We propose a novel framework to learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE)
HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE, such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE)
Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance.
arXiv Detail & Related papers (2023-12-31T01:39:38Z) - Recursive Visual Programming [53.76415744371285]
We propose Recursive Visual Programming (RVP), which simplifies generated routines, provides more efficient problem solving, and can manage more complex data structures.
We show RVP's efficacy through extensive experiments on benchmarks including VSR, COVR, GQA, and NextQA.
arXiv Detail & Related papers (2023-12-04T17:27:24Z) - Soft Convex Quantization: Revisiting Vector Quantization with Convex
Optimization [40.1651740183975]
We propose Soft Convex Quantization (SCQ) as a direct substitute for Vector Quantization (VQ)
SCQ works like a differentiable convex optimization (DCO) layer.
We demonstrate its efficacy on the CIFAR-10, GTSRB and LSUN datasets.
arXiv Detail & Related papers (2023-10-04T17:45:14Z) - Finite Scalar Quantization: VQ-VAE Made Simple [26.351016719675766]
We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with a simple scheme termed finite scalar quantization (FSQ)
By appropriately choosing the number of dimensions and values each dimension can take, we obtain the same codebook size as in VQ.
We employ FSQ with MaskGIT for image generation, and with UViM for depth estimation, colorization, and panoptic segmentation.
arXiv Detail & Related papers (2023-09-27T09:13:40Z) - Not All Image Regions Matter: Masked Vector Quantization for
Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.
We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - Vector Quantized Wasserstein Auto-Encoder [57.29764749855623]
We study learning deep discrete representations from the generative viewpoint.
We endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution.
We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution.
arXiv Detail & Related papers (2023-02-12T13:51:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.