RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder
- URL: http://arxiv.org/abs/2405.14222v1
- Date: Thu, 23 May 2024 06:32:42 GMT
- Title: RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder
- Authors: Jiwan Seo, Joonhyuk Kang,
- Abstract summary: We introduce the Rate-Adaptive VQ-VAE (RAQ-VAE) framework, which addresses the challenge with two novel codebook representation methods.
Our experiments demonstrate that RAQ-VAE achieves effective reconstruction performance across multiple rates, often outperforming conventional fixed-rate VQ-VAE models.
This work enhances the adaptability and performance of VQ-VAEs, with broad applications in data reconstruction, generation, and computer vision tasks.
- Score: 3.7906296809297393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vector Quantized Variational AutoEncoder (VQ-VAE) is an established technique in machine learning for learning discrete representations across various modalities. However, its scalability and applicability are limited by the need to retrain the model to adjust the codebook for different data or model scales. We introduce the Rate-Adaptive VQ-VAE (RAQ-VAE) framework, which addresses this challenge with two novel codebook representation methods: a model-based approach using a clustering-based technique on an existing well-trained VQ-VAE model, and a data-driven approach utilizing a sequence-to-sequence (Seq2Seq) model for variable-rate codebook generation. Our experiments demonstrate that RAQ-VAE achieves effective reconstruction performance across multiple rates, often outperforming conventional fixed-rate VQ-VAE models. This work enhances the adaptability and performance of VQ-VAEs, with broad applications in data reconstruction, generation, and computer vision tasks.
Related papers
- Gaussian Mixture Vector Quantization with Aggregated Categorical Posterior [5.862123282894087]
We introduce the Vector Quantized Variational Autoencoder (VQ-VAE)
VQ-VAE is a type of variational autoencoder using discrete embedding as latent.
We show that GM-VQ improves codebook utilization and reduces information loss without relying on handcrafteds.
arXiv Detail & Related papers (2024-10-14T05:58:11Z) - Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution [82.38677987249348]
We present the Qwen2-VL Series, which redefines the conventional predetermined-resolution approach in visual processing.
Qwen2-VL introduces the Naive Dynamic Resolution mechanism, which enables the model to dynamically process images of varying resolutions into different numbers of visual tokens.
The model also integrates Multimodal Rotary Position Embedding (M-RoPE), facilitating the effective fusion of positional information across text, images, and videos.
arXiv Detail & Related papers (2024-09-18T17:59:32Z) - Balance of Number of Embedding and their Dimensions in Vector Quantization [11.577770138594436]
This study examines the balance between the codebook sizes and dimensions of embeddings in the Vector Quantized Variational Autoencoder (VQ-VAE) architecture.
We propose a novel adaptive dynamic quantization approach, underpinned by the Gumbel-Softmax mechanism.
arXiv Detail & Related papers (2024-07-06T03:07:31Z) - HyperVQ: MLR-based Vector Quantization in Hyperbolic Space [56.4245885674567]
We study the use of hyperbolic spaces for vector quantization (HyperVQ)
We show that hyperVQ performs comparably in reconstruction and generative tasks while outperforming VQ in discriminative tasks and learning a highly disentangled latent space.
arXiv Detail & Related papers (2024-03-18T03:17:08Z) - HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes [18.57499609338579]
We propose a novel framework to learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE)
HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE, such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE)
Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance.
arXiv Detail & Related papers (2023-12-31T01:39:38Z) - LL-VQ-VAE: Learnable Lattice Vector-Quantization For Efficient
Representations [0.0]
We introduce learnable lattice vector quantization and demonstrate its effectiveness for learning discrete representations.
Our method, termed LL-VQ-VAE, replaces the vector quantization layer in VQ-VAE with lattice-based discretization.
Compared to VQ-VAE, our method obtains lower reconstruction errors under the same training conditions, trains in a fraction of the time, and with a constant number of parameters.
arXiv Detail & Related papers (2023-10-13T20:03:18Z) - Learning Answer Generation using Supervision from Automatic Question
Answering Evaluators [98.9267570170737]
We propose a novel training paradigm for GenQA using supervision from automatic QA evaluation models (GAVA)
We evaluate our proposed methods on two academic and one industrial dataset, obtaining a significant improvement in answering accuracy over the previous state of the art.
arXiv Detail & Related papers (2023-05-24T16:57:04Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed
Stochastic Quantization [13.075574481614478]
One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook.
We propose a new training scheme that extends the standard VAE via novel dequantization and quantization.
Our experiments show that SQ-VAE improves codebook utilization without using commons.
arXiv Detail & Related papers (2022-05-16T09:49:37Z) - DiscreTalk: Text-to-Speech as a Machine Translation Problem [52.33785857500754]
This paper proposes a new end-to-end text-to-speech (E2E-TTS) model based on neural machine translation (NMT)
The proposed model consists of two components; a non-autoregressive vector quantized variational autoencoder (VQ-VAE) model and an autoregressive Transformer-NMT model.
arXiv Detail & Related papers (2020-05-12T02:45:09Z) - FLAT: Few-Shot Learning via Autoencoding Transformation Regularizers [67.46036826589467]
We present a novel regularization mechanism by learning the change of feature representations induced by a distribution of transformations without using the labels of data examples.
It could minimize the risk of overfitting into base categories by inspecting the transformation-augmented variations at the encoded feature level.
Experiment results show the superior performances to the current state-of-the-art methods in literature.
arXiv Detail & Related papers (2019-12-29T15:26:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.