Efficient Mixture Learning in Black-Box Variational Inference
- URL: http://arxiv.org/abs/2406.07083v1
- Date: Tue, 11 Jun 2024 09:16:43 GMT
- Title: Efficient Mixture Learning in Black-Box Variational Inference
- Authors: Alexandra Hotti, Oskar Kviman, Ricky Molén, Víctor Elvira, Jens Lagergren,
- Abstract summary: We introduce the novel Multiple Importance Sampling Variational Autoencoder (MISVAE)
We construct two new estimators of the ELBO for mixtures in BBVI, enabling a tremendous reduction in inference time.
Experimenting with MISVAE, we achieve astonishing, SOTA results on MNIST.
- Score: 50.21722672770176
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixture variational distributions in black box variational inference (BBVI) have demonstrated impressive results in challenging density estimation tasks. However, currently scaling the number of mixture components can lead to a linear increase in the number of learnable parameters and a quadratic increase in inference time due to the evaluation of the evidence lower bound (ELBO). Our two key contributions address these limitations. First, we introduce the novel Multiple Importance Sampling Variational Autoencoder (MISVAE), which amortizes the mapping from input to mixture-parameter space using one-hot encodings. Fortunately, with MISVAE, each additional mixture component incurs a negligible increase in network parameters. Second, we construct two new estimators of the ELBO for mixtures in BBVI, enabling a tremendous reduction in inference time with marginal or even improved impact on performance. Collectively, our contributions enable scalability to hundreds of mixture components and provide superior estimation performance in shorter time, with fewer network parameters compared to previous Mixture VAEs. Experimenting with MISVAE, we achieve astonishing, SOTA results on MNIST. Furthermore, we empirically validate our estimators in other BBVI settings, including Bayesian phylogenetic inference, where we improve inference times for the SOTA mixture model on eight data sets.
Related papers
- Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms.
We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law.
Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z) - Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing.
We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z) - AMPLIFY:Attention-based Mixup for Performance Improvement and Label Smoothing in Transformer [2.3072402651280517]
AMPLIFY uses the Attention mechanism of Transformer itself to reduce the influence of noises and aberrant values in the original samples on the prediction results.
The experimental results show that, under a smaller computational resource cost, AMPLIFY outperforms other Mixup methods in text classification tasks.
arXiv Detail & Related papers (2023-09-22T08:02:45Z) - Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood [64.95663299945171]
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming.
There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models.
We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
arXiv Detail & Related papers (2023-09-10T22:05:24Z) - Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network.
We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z) - RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and
Out Distribution Robustness [94.69774317059122]
We show that the effectiveness of the well celebrated Mixup can be further improved if instead of using it as the sole learning objective, it is utilized as an additional regularizer to the standard cross-entropy loss.
This simple change not only provides much improved accuracy but also significantly improves the quality of the predictive uncertainty estimation of Mixup.
arXiv Detail & Related papers (2022-06-29T09:44:33Z) - A Statistics and Deep Learning Hybrid Method for Multivariate Time
Series Forecasting and Mortality Modeling [0.0]
Exponential Smoothing Recurrent Neural Network (ES-RNN) is a hybrid between a statistical forecasting model and a recurrent neural network variant.
ES-RNN achieves a 9.4% improvement in absolute error in the Makridakis-4 Forecasting Competition.
arXiv Detail & Related papers (2021-12-16T04:44:19Z) - Noisy Feature Mixup [42.056684988818766]
We introduce Noisy Feature Mixup (NFM), an inexpensive yet effective method for data augmentation.
NFM includes mixup and manifold mixup as special cases, but it has additional advantages, including better smoothing of decision boundaries.
We show that residual networks and vision transformers trained with NFM have favorable trade-offs between predictive accuracy on clean data and robustness with respect to various types of data.
arXiv Detail & Related papers (2021-10-05T17:13:51Z) - Bayesian Neural Networks With Maximum Mean Discrepancy Regularization [13.97417198693205]
We show that our BNNs achieve higher accuracy on multiple benchmarks, including several image classification tasks.
We also provide a new formulation for estimating the uncertainty on a given prediction, showing it performs in a more robust fashion against adversarial attacks.
arXiv Detail & Related papers (2020-03-02T14:54:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.