SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed
Stochastic Quantization
- URL: http://arxiv.org/abs/2205.07547v1
- Date: Mon, 16 May 2022 09:49:37 GMT
- Title: SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed
Stochastic Quantization
- Authors: Yuhta Takida, Takashi Shibuya, WeiHsiang Liao, Chieh-Hsin Lai, Junki
Ohmura, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi, Toshiyuki
Kumakura, Yuki Mitsufuji
- Abstract summary: One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook.
We propose a new training scheme that extends the standard VAE via novel dequantization and quantization.
Our experiments show that SQ-VAE improves codebook utilization without using commons.
- Score: 13.075574481614478
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that
the learned discrete representation uses only a fraction of the full capacity
of the codebook, also known as codebook collapse. We hypothesize that the
training scheme of VQ-VAE, which involves some carefully designed heuristics,
underlies this issue. In this paper, we propose a new training scheme that
extends the standard VAE via novel stochastic dequantization and quantization,
called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we
observe a trend that the quantization is stochastic at the initial stage of the
training but gradually converges toward a deterministic quantization, which we
call self-annealing. Our experiments show that SQ-VAE improves codebook
utilization without using common heuristics. Furthermore, we empirically show
that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.
Related papers
- RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder [3.7906296809297393]
We introduce the Rate-Adaptive VQ-VAE (RAQ-VAE) framework, which addresses the challenge with two novel codebook representation methods.
Our experiments demonstrate that RAQ-VAE achieves effective reconstruction performance across multiple rates, often outperforming conventional fixed-rate VQ-VAE models.
This work enhances the adaptability and performance of VQ-VAEs, with broad applications in data reconstruction, generation, and computer vision tasks.
arXiv Detail & Related papers (2024-05-23T06:32:42Z) - HyperVQ: MLR-based Vector Quantization in Hyperbolic Space [56.4245885674567]
We study the use of hyperbolic spaces for vector quantization (HyperVQ)
We show that hyperVQ performs comparably in reconstruction and generative tasks while outperforming VQ in discriminative tasks and learning a highly disentangled latent space.
arXiv Detail & Related papers (2024-03-18T03:17:08Z) - Self-Supervised Speech Quality Estimation and Enhancement Using Only
Clean Speech [50.95292368372455]
We propose VQScore, a self-supervised metric for evaluating speech based on the quantization error of a vector-quantized-variational autoencoder (VQ-VAE)
The training of VQ-VAE relies on clean speech; hence, large quantization errors can be expected when the speech is distorted.
We found that the vector quantization mechanism could also be used for self-supervised speech enhancement (SE) model training.
arXiv Detail & Related papers (2024-02-26T06:01:38Z) - HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes [18.57499609338579]
We propose a novel framework to learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE)
HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE, such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE)
Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance.
arXiv Detail & Related papers (2023-12-31T01:39:38Z) - LL-VQ-VAE: Learnable Lattice Vector-Quantization For Efficient
Representations [0.0]
We introduce learnable lattice vector quantization and demonstrate its effectiveness for learning discrete representations.
Our method, termed LL-VQ-VAE, replaces the vector quantization layer in VQ-VAE with lattice-based discretization.
Compared to VQ-VAE, our method obtains lower reconstruction errors under the same training conditions, trains in a fraction of the time, and with a constant number of parameters.
arXiv Detail & Related papers (2023-10-13T20:03:18Z) - Finite Scalar Quantization: VQ-VAE Made Simple [26.351016719675766]
We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with a simple scheme termed finite scalar quantization (FSQ)
By appropriately choosing the number of dimensions and values each dimension can take, we obtain the same codebook size as in VQ.
We employ FSQ with MaskGIT for image generation, and with UViM for depth estimation, colorization, and panoptic segmentation.
arXiv Detail & Related papers (2023-09-27T09:13:40Z) - Online Clustered Codebook [100.1650001618827]
We present a simple alternative method for online codebook learning, Clustering VQ-VAE (CVQ-VAE)
Our approach selects encoded features as anchors to update the dead'' codevectors, while optimising the codebooks which are alive via the original loss.
Our CVQ-VAE can be easily integrated into the existing models with just a few lines of code.
arXiv Detail & Related papers (2023-07-27T18:31:04Z) - Vector Quantized Wasserstein Auto-Encoder [57.29764749855623]
We study learning deep discrete representations from the generative viewpoint.
We endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution.
We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution.
arXiv Detail & Related papers (2023-02-12T13:51:36Z) - Efficient measure for the expressivity of variational quantum algorithms [72.59790225766777]
We exploit an advanced tool in statistical learning theory, i.e., covering number, to study the expressivity of variational quantum algorithms.
We first exhibit how the expressivity of VQAs with an arbitrary ansatze is upper bounded by the number of quantum gates and the measurement observable.
We then explore the expressivity of VQAs on near-term quantum chips, where the system noise is considered.
arXiv Detail & Related papers (2021-04-20T13:51:08Z) - MUTANT: A Training Paradigm for Out-of-Distribution Generalization in
Visual Question Answering [58.30291671877342]
We present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input.
MUTANT establishes a new state-of-the-art accuracy on VQA-CP with a $10.57%$ improvement.
arXiv Detail & Related papers (2020-09-18T00:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.