Related papers: Learning with MISELBO: The Mixture Cookbook

Learning with MISELBO: The Mixture Cookbook

URL: http://arxiv.org/abs/2209.15514v1
Date: Fri, 30 Sep 2022 15:01:35 GMT
Title: Learning with MISELBO: The Mixture Cookbook
Authors: Oskar Kviman, Ricky Mol\'en, Alexandra Hotti, Semih Kurt, V\'ictor Elvira and Jens Lagergren
Abstract summary: We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network. We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling. We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
Score: 62.75516608080322
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Mixture models in variational inference (VI) is an active field of research. Recent works have established their connection to multiple importance sampling (MIS) through the MISELBO and advanced the use of ensemble approximations for large-scale problems. However, as we show here, an independent learning of the ensemble components can lead to suboptimal diversity. Hence, we study the effect of instead using MISELBO as an objective function for learning mixtures, and we propose the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network. Two major insights led to the construction of this novel composite model. First, mixture models have potential to be off-the-shelf tools for practitioners to obtain more flexible posterior approximations in VAEs. Therefore, we make them more accessible by demonstrating how to apply them to four popular architectures. Second, the mixture components cooperate in order to cover the target distribution while trying to maximize their diversity when MISELBO is the objective function. We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling. Finally, we demonstrate the superiority of the Mixture VAEs' learned feature representations on both image and single-cell transcriptome data, and obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets. Code available here: \url{https://github.com/Lagergren-Lab/MixtureVAEs}.

Related papers

MixRec: Heterogeneous Graph Collaborative Filtering [21.96510707666373]
We present a graph collaborative filtering model MixRec to disentangling users' multi-behavior interaction patterns. Our model achieves this by incorporating intent disentanglement and multi-behavior modeling. We also introduce a novel contrastive learning paradigm that adaptively explores the advantages of self-supervised data augmentation.
arXiv Detail & Related papers (2024-12-18T13:12:36Z)
An Empirical Study of Multimodal Model Merging [148.48412442848795]
Model merging is a technique that fuses multiple models trained on different tasks to generate a multi-task solution. We conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture. We propose two metrics that assess the distance between weights to be merged and can serve as an indicator of the merging outcomes.
arXiv Detail & Related papers (2023-04-28T15:43:21Z)
MixupE: Understanding and Improving Mixup from Directional Derivative Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z)
Continual Learning with Optimal Transport based Mixture Model [17.398605698033656]
We propose an online mixture model learning approach based on nice properties of the mature optimal transport theory (OT-MM) Our proposed method can significantly outperform the current state-of-the-art baselines.
arXiv Detail & Related papers (2022-11-30T06:40:29Z)
A Fair Experimental Comparison of Neural Network Architectures for Latent Representations of Multi-Omics for Drug Response Prediction [7.690774882108066]
We train and optimize multi-omics integration methods under equal conditions. We devised a novel method, Omics Stacking, that combines the advantages of intermediate and late integration. Experiments were conducted on a public drug response data set with multiple omics data.
arXiv Detail & Related papers (2022-08-31T12:46:08Z)
Gaussian Mixture Variational Autoencoder with Contrastive Learning for Multi-Label Classification [27.043136219527767]
We propose a novel contrastive learning boosted multi-label prediction model. By using contrastive learning in the supervised setting, we can exploit label information effectively. We show that the learnt embeddings provide insights into the interpretation of label-label interactions.
arXiv Detail & Related papers (2021-12-02T04:23:34Z)
Lifelong Mixture of Variational Autoencoders [15.350366047108103]
We propose an end-to-end lifelong learning mixture of experts. The experts in the mixture system are jointly trained by maximizing a mixture of individual component evidence lower bounds. The model can learn new tasks fast when these are similar to those previously learnt.
arXiv Detail & Related papers (2021-07-09T22:07:39Z)
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks [97.08677678499075]
We introduce MixMo, a new framework for learning multi-input multi-output deepworks. We show that binary mixing in features - particularly with patches from CutMix - enhances results by makingworks stronger and more diverse. In addition to being easy to implement and adding no cost at inference, our models outperform much costlier data augmented deep ensembles.
arXiv Detail & Related papers (2021-03-10T15:31:02Z)
Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models. We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)
Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one. Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP. We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z)
VMLoc: Variational Fusion For Learning-Based Multimodal Camera Localization [46.607930208613574]
We propose an end-to-end framework, termed VMLoc, to fuse different sensor inputs into a common latent space. Unlike previous multimodal variational works directly adapting the objective function of vanilla variational auto-encoder, we show how camera localization can be accurately estimated.
arXiv Detail & Related papers (2020-03-12T14:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.