Related papers: RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder

RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder

URL: http://arxiv.org/abs/2405.14222v1
Date: Thu, 23 May 2024 06:32:42 GMT
Title: RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder
Authors: Jiwan Seo, Joonhyuk Kang,
Abstract summary: We introduce the Rate-Adaptive VQ-VAE (RAQ-VAE) framework, which addresses the challenge with two novel codebook representation methods. Our experiments demonstrate that RAQ-VAE achieves effective reconstruction performance across multiple rates, often outperforming conventional fixed-rate VQ-VAE models. This work enhances the adaptability and performance of VQ-VAEs, with broad applications in data reconstruction, generation, and computer vision tasks.
Score: 3.7906296809297393
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vector Quantized Variational AutoEncoder (VQ-VAE) is an established technique in machine learning for learning discrete representations across various modalities. However, its scalability and applicability are limited by the need to retrain the model to adjust the codebook for different data or model scales. We introduce the Rate-Adaptive VQ-VAE (RAQ-VAE) framework, which addresses this challenge with two novel codebook representation methods: a model-based approach using a clustering-based technique on an existing well-trained VQ-VAE model, and a data-driven approach utilizing a sequence-to-sequence (Seq2Seq) model for variable-rate codebook generation. Our experiments demonstrate that RAQ-VAE achieves effective reconstruction performance across multiple rates, often outperforming conventional fixed-rate VQ-VAE models. This work enhances the adaptability and performance of VQ-VAEs, with broad applications in data reconstruction, generation, and computer vision tasks.

Related papers

Adaptable Embeddings Network (AEN) [49.1574468325115]
We introduce Adaptable Embeddings Networks (AEN), a novel dual-encoder architecture using Kernel Density Estimation (KDE) AEN allows for runtime adaptation of classification criteria without retraining and is non-autoregressive. The architecture's ability to preprocess and cache condition embeddings makes it ideal for edge computing applications and real-time monitoring systems.
arXiv Detail & Related papers (2024-11-21T02:15:52Z)
Gaussian Mixture Vector Quantization with Aggregated Categorical Posterior [5.862123282894087]
We introduce the Vector Quantized Variational Autoencoder (VQ-VAE) VQ-VAE is a type of variational autoencoder using discrete embedding as latent. We show that GM-VQ improves codebook utilization and reduces information loss without relying on handcrafteds.
arXiv Detail & Related papers (2024-10-14T05:58:11Z)
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution [82.38677987249348]
We present the Qwen2-VL Series, which redefines the conventional predetermined-resolution approach in visual processing. Qwen2-VL introduces the Naive Dynamic Resolution mechanism, which enables the model to dynamically process images of varying resolutions into different numbers of visual tokens. The model also integrates Multimodal Rotary Position Embedding (M-RoPE), facilitating the effective fusion of positional information across text, images, and videos.
arXiv Detail & Related papers (2024-09-18T17:59:32Z)
VQ-Flow: Taming Normalizing Flows for Multi-Class Anomaly Detection via Hierarchical Vector Quantization [101.41553763861381]
This paper explores the potential of normalizing flows in multi-class anomaly detection. We empower the flow models to distinguish different concepts of multi-class normal data in an unsupervised manner, resulting in a novel flow-based unified method, named VQ-Flow. The proposed VQ-Flow advances the state-of-the-art in multi-class anomaly detection within a unified training scheme, yielding the Det./Loc. AUROC of 99.5%/98.3% on MVTec AD.
arXiv Detail & Related papers (2024-09-02T05:01:41Z)
Balance of Number of Embedding and their Dimensions in Vector Quantization [11.577770138594436]
This study examines the balance between the codebook sizes and dimensions of embeddings in the Vector Quantized Variational Autoencoder (VQ-VAE) architecture. We propose a novel adaptive dynamic quantization approach, underpinned by the Gumbel-Softmax mechanism.
arXiv Detail & Related papers (2024-07-06T03:07:31Z)
HyperVQ: MLR-based Vector Quantization in Hyperbolic Space [56.4245885674567]
We study the use of hyperbolic spaces for vector quantization (HyperVQ) We show that hyperVQ performs comparably in reconstruction and generative tasks while outperforming VQ in discriminative tasks and learning a highly disentangled latent space.
arXiv Detail & Related papers (2024-03-18T03:17:08Z)
HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes [18.57499609338579]
We propose a novel framework to learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE) HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE, such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE) Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance.
arXiv Detail & Related papers (2023-12-31T01:39:38Z)
LL-VQ-VAE: Learnable Lattice Vector-Quantization For Efficient Representations [0.0]
We introduce learnable lattice vector quantization and demonstrate its effectiveness for learning discrete representations. Our method, termed LL-VQ-VAE, replaces the vector quantization layer in VQ-VAE with lattice-based discretization. Compared to VQ-VAE, our method obtains lower reconstruction errors under the same training conditions, trains in a fraction of the time, and with a constant number of parameters.
arXiv Detail & Related papers (2023-10-13T20:03:18Z)
Soft Convex Quantization: Revisiting Vector Quantization with Convex Optimization [40.1651740183975]
We propose Soft Convex Quantization (SCQ) as a direct substitute for Vector Quantization (VQ) SCQ works like a differentiable convex optimization (DCO) layer. We demonstrate its efficacy on the CIFAR-10, GTSRB and LSUN datasets.
arXiv Detail & Related papers (2023-10-04T17:45:14Z)
Online Clustered Codebook [100.1650001618827]
We present a simple alternative method for online codebook learning, Clustering VQ-VAE (CVQ-VAE) Our approach selects encoded features as anchors to update the dead'' codevectors, while optimising the codebooks which are alive via the original loss. Our CVQ-VAE can be easily integrated into the existing models with just a few lines of code.
arXiv Detail & Related papers (2023-07-27T18:31:04Z)
An Empirical Comparison of LM-based Question and Answer Generation Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context. In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning. Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z)
Learning Answer Generation using Supervision from Automatic Question Answering Evaluators [98.9267570170737]
We propose a novel training paradigm for GenQA using supervision from automatic QA evaluation models (GAVA) We evaluate our proposed methods on two academic and one industrial dataset, obtaining a significant improvement in answering accuracy over the previous state of the art.
arXiv Detail & Related papers (2023-05-24T16:57:04Z)
CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z)
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization [13.075574481614478]
One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook. We propose a new training scheme that extends the standard VAE via novel dequantization and quantization. Our experiments show that SQ-VAE improves codebook utilization without using commons.
arXiv Detail & Related papers (2022-05-16T09:49:37Z)
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering [49.36818290978525]
Recompositions of existing visual concepts can generate unseen compositions in the training set. We propose a graph generative modeling-based training scheme (X-GGM) to handle the problem implicitly. The baseline VQA model trained with the X-GGM scheme achieves state-of-the-art OOD performance on two standard VQA OOD benchmarks.
arXiv Detail & Related papers (2021-07-24T10:17:48Z)
Generating Diverse and Consistent QA pairs from Contexts with Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts. Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z)
FLAT: Few-Shot Learning via Autoencoding Transformation Regularizers [67.46036826589467]
We present a novel regularization mechanism by learning the change of feature representations induced by a distribution of transformations without using the labels of data examples. It could minimize the risk of overfitting into base categories by inspecting the transformation-augmented variations at the encoded feature level. Experiment results show the superior performances to the current state-of-the-art methods in literature.
arXiv Detail & Related papers (2019-12-29T15:26:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.