HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes
- URL: http://arxiv.org/abs/2401.00365v2
- Date: Thu, 28 Mar 2024 06:38:55 GMT
- Title: HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes
- Authors: Yuhta Takida, Yukara Ikemiya, Takashi Shibuya, Kazuki Shimada, Woosung Choi, Chieh-Hsin Lai, Naoki Murata, Toshimitsu Uesaka, Kengo Uchida, Wei-Hsiang Liao, Yuki Mitsufuji,
- Abstract summary: We propose a novel framework to learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE)
HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE, such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE)
Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance.
- Score: 18.57499609338579
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity reconstructions. However, such hierarchical extensions of VQ-VAE often suffer from the codebook/layer collapse issue, where the codebook is not efficiently used to express the data, and hence degrades reconstruction accuracy. To mitigate this problem, we propose a novel unified framework to stochastically learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE). HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE, such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE), and provides them with a Bayesian training scheme. Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance. We also validated HQ-VAE in terms of its applicability to a different modality with an audio dataset.
Related papers
- RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder [3.7906296809297393]
We introduce the Rate-Adaptive VQ-VAE (RAQ-VAE) framework, which addresses the challenge with two novel codebook representation methods.
Our experiments demonstrate that RAQ-VAE achieves effective reconstruction performance across multiple rates, often outperforming conventional fixed-rate VQ-VAE models.
This work enhances the adaptability and performance of VQ-VAEs, with broad applications in data reconstruction, generation, and computer vision tasks.
arXiv Detail & Related papers (2024-05-23T06:32:42Z) - HyperVQ: MLR-based Vector Quantization in Hyperbolic Space [56.4245885674567]
We study the use of hyperbolic spaces for vector quantization (HyperVQ)
We show that hyperVQ performs comparably in reconstruction and generative tasks while outperforming VQ in discriminative tasks and learning a highly disentangled latent space.
arXiv Detail & Related papers (2024-03-18T03:17:08Z) - LL-VQ-VAE: Learnable Lattice Vector-Quantization For Efficient
Representations [0.0]
We introduce learnable lattice vector quantization and demonstrate its effectiveness for learning discrete representations.
Our method, termed LL-VQ-VAE, replaces the vector quantization layer in VQ-VAE with lattice-based discretization.
Compared to VQ-VAE, our method obtains lower reconstruction errors under the same training conditions, trains in a fraction of the time, and with a constant number of parameters.
arXiv Detail & Related papers (2023-10-13T20:03:18Z) - Online Clustered Codebook [100.1650001618827]
We present a simple alternative method for online codebook learning, Clustering VQ-VAE (CVQ-VAE)
Our approach selects encoded features as anchors to update the dead'' codevectors, while optimising the codebooks which are alive via the original loss.
Our CVQ-VAE can be easily integrated into the existing models with just a few lines of code.
arXiv Detail & Related papers (2023-07-27T18:31:04Z) - Vector Quantized Wasserstein Auto-Encoder [57.29764749855623]
We study learning deep discrete representations from the generative viewpoint.
We endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution.
We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution.
arXiv Detail & Related papers (2023-02-12T13:51:36Z) - Hierarchical Residual Learning Based Vector Quantized Variational
Autoencoder for Image Reconstruction and Generation [19.92324010429006]
We propose a multi-layer variational autoencoder method, we call HR-VQVAE, that learns hierarchical discrete representations of the data.
We evaluate our method on the tasks of image reconstruction and generation.
arXiv Detail & Related papers (2022-08-09T06:04:25Z) - SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed
Stochastic Quantization [13.075574481614478]
One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook.
We propose a new training scheme that extends the standard VAE via novel dequantization and quantization.
Our experiments show that SQ-VAE improves codebook utilization without using commons.
arXiv Detail & Related papers (2022-05-16T09:49:37Z) - Hierarchical Variational Autoencoder for Visual Counterfactuals [79.86967775454316]
Conditional Variational Autos (VAE) are gathering significant attention as an Explainable Artificial Intelligence (XAI) tool.
In this paper we show how relaxing the effect of the posterior leads to successful counterfactuals.
We introduce VAEX an Hierarchical VAE designed for this approach that can visually audit a classifier in applications.
arXiv Detail & Related papers (2021-02-01T14:07:11Z) - Learning from Lexical Perturbations for Consistent Visual Question
Answering [78.21912474223926]
Existing Visual Question Answering (VQA) models are often fragile and sensitive to input variations.
We propose a novel approach to address this issue based on modular networks, which creates two questions related by linguistic perturbations.
We also present VQA Perturbed Pairings (VQA P2), a new, low-cost benchmark and augmentation pipeline to create controllable linguistic variations.
arXiv Detail & Related papers (2020-11-26T17:38:03Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Hierarchical Quantized Autoencoders [3.9146761527401432]
We motivate the use of a hierarchy of Vector Quantized Variencoders (VQ-VAEs) to attain high factors of compression.
We show that a combination of quantization and hierarchical latent structure aids likelihood-based image compression.
Our resulting scheme produces a Markovian series of latent variables that reconstruct images of high-perceptual quality.
arXiv Detail & Related papers (2020-02-19T11:26:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.