Efficient-VDVAE: Less is more
- URL: http://arxiv.org/abs/2203.13751v1
- Date: Fri, 25 Mar 2022 16:29:46 GMT
- Title: Efficient-VDVAE: Less is more
- Authors: Louay Hazami, Rayhane Mama, Ragavan Thurairatnam
- Abstract summary: We present modifications to the Very Deep VAE to make it converge up to $2.6times$ faster.
Our models achieve comparable or better negative log-likelihood performance than current state-of-the-art models.
We empirically demonstrate that roughly $3%$ of the hierarchical VAE's latent space dimensions is sufficient to encode most of the image information.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical VAEs have emerged in recent years as a reliable option for
maximum likelihood estimation. However, instability issues and demanding
computational requirements have hindered research progress in the area. We
present simple modifications to the Very Deep VAE to make it converge up to
$2.6\times$ faster, save up to $20\times$ in memory load and improve stability
during training. Despite these changes, our models achieve comparable or better
negative log-likelihood performance than current state-of-the-art models on all
$7$ commonly used image datasets we evaluated on. We also make an argument
against using 5-bit benchmarks as a way to measure hierarchical VAE's
performance due to undesirable biases caused by the 5-bit quantization.
Additionally, we empirically demonstrate that roughly $3\%$ of the hierarchical
VAE's latent space dimensions is sufficient to encode most of the image
information, without loss of performance, opening up the doors to efficiently
leverage the hierarchical VAEs' latent space in downstream tasks. We release
our source code and models at https://github.com/Rayhane-mamah/Efficient-VDVAE .
Related papers
- SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models [58.5019443418822]
Diffusion models have been proven highly effective at generating high-quality images.
As these models grow larger, they require significantly more memory and suffer from higher latency.
In this work, we aim to accelerate diffusion models by quantizing their weights and activations to 4 bits.
arXiv Detail & Related papers (2024-11-07T18:59:58Z) - Balancing Performance and Efficiency in Zero-shot Robotic Navigation [1.6574413179773757]
We present an optimization study of the Vision-Language Frontier Maps applied to the Object Goal Navigation task in robotics.
Our work evaluates the efficiency and performance of various vision-language models, object detectors, segmentation models, and Visual Question Answering modules.
arXiv Detail & Related papers (2024-06-05T07:31:05Z) - An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models [65.37846460916042]
We find out that the attention computation over visual tokens is of extreme inefficiency in the deep layers of popular LVLMs.
We introduce FastV, a versatile plug-and-play method designed to optimize computational efficiency.
arXiv Detail & Related papers (2024-03-11T14:35:32Z) - HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM
Inference [68.59839755875252]
HiRE comprises of two novel components: (i) a compression scheme to cheaply predict top-$k$ rows/columns with high recall, followed by full computation restricted to the predicted subset, and (ii) DA-TOP-$k$: an efficient multi-device approximate top-$k$ operator.
We demonstrate that on a one billion parameter model, HiRE applied to both the softmax as well as feedforward layers, achieves almost matching pretraining and downstream accuracy, and speeds up inference latency by $1.47times$ on a single TPUv5e device.
arXiv Detail & Related papers (2024-02-14T18:04:36Z) - DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture.
DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z) - Adaptive Sparsity Level during Training for Efficient Time Series Forecasting with Transformers [20.23085795744602]
We propose textbfAdaptive textbfSparsity textbfLevel (textbfPALS) to automatically seek a decent balance between loss and sparsity.
PALS draws inspiration from sparse training and during-training methods.
It introduces the novel "expand" mechanism in training sparse neural networks, allowing the model to dynamically shrink, expand, or remain stable to find a proper sparsity level.
arXiv Detail & Related papers (2023-05-28T06:57:27Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them
on Images [9.667538864515285]
We present a hierarchical VAE that, for the first time, generates samples quickly while outperforming the PixelCNN in log-likelihood on all natural image benchmarks.
In theory, VAEs can represent autoregressive models, as well as faster, better models if they exist, when made sufficiently deep.
arXiv Detail & Related papers (2020-11-20T21:35:31Z) - NVAE: A Deep Hierarchical Variational Autoencoder [102.29977384039805]
We propose a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization.
We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models.
To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256$times $256 pixels.
arXiv Detail & Related papers (2020-07-08T04:56:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.