Lifelong Mixture of Variational Autoencoders
- URL: http://arxiv.org/abs/2107.04694v1
- Date: Fri, 9 Jul 2021 22:07:39 GMT
- Title: Lifelong Mixture of Variational Autoencoders
- Authors: Fei Ye and Adrian G. Bors
- Abstract summary: We propose an end-to-end lifelong learning mixture of experts.
The experts in the mixture system are jointly trained by maximizing a mixture of individual component evidence lower bounds.
The model can learn new tasks fast when these are similar to those previously learnt.
- Score: 15.350366047108103
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we propose an end-to-end lifelong learning mixture of experts.
Each expert is implemented by a Variational Autoencoder (VAE). The experts in
the mixture system are jointly trained by maximizing a mixture of individual
component evidence lower bounds (MELBO) on the log-likelihood of the given
training samples. The mixing coefficients in the mixture, control the
contributions of each expert in the goal representation. These are sampled from
a Dirichlet distribution whose parameters are determined through non-parametric
estimation during lifelong learning. The model can learn new tasks fast when
these are similar to those previously learnt. The proposed Lifelong mixture of
VAE (L-MVAE) expands its architecture with new components when learning a
completely new task. After the training, our model can automatically determine
the relevant expert to be used when fed with new data samples. This mechanism
benefits both the memory efficiency and the required computational cost as only
one expert is used during the inference. The L-MVAE inference model is able to
perform interpolation in the joint latent space across the data domains
associated with different tasks and is shown to be efficient for disentangled
learning representation.
Related papers
- Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms.
We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law.
Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z) - Collaborative Learning with Different Labeling Functions [7.228285747845779]
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions.
We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible.
arXiv Detail & Related papers (2024-02-16T04:32:22Z) - Task-customized Masked AutoEncoder via Mixture of Cluster-conditional
Experts [104.9871176044644]
Masked Autoencoder(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training.
We propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE)
MoCE trains each expert only with semantically relevant images by using cluster-conditional gates.
arXiv Detail & Related papers (2024-02-08T03:46:32Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners [74.92558307689265]
We propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad')
We optimize this matching process during the training of a single model.
Experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
arXiv Detail & Related papers (2022-12-15T18:59:52Z) - Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network.
We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z) - Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet
Process [15.350366047108103]
Recent research efforts in lifelong learning propose to grow a mixture of models to adapt to an increasing number of tasks.
We perform the theoretical analysis for lifelong learning models by deriving the risk bounds based on the discrepancy distance between the probabilistic representation of data.
Inspired by the theoretical analysis, we introduce a new lifelong learning approach, namely the Lifelong Infinite Mixture (LIMix) model.
arXiv Detail & Related papers (2021-08-25T21:06:20Z) - Federated Mixture of Experts [94.25278695272874]
FedMix is a framework that allows us to train an ensemble of specialized models.
We show that users with similar data characteristics select the same members and therefore share statistical strength.
arXiv Detail & Related papers (2021-07-14T14:15:24Z) - Automatic Differentiation Variational Inference with Mixtures [4.995383193706478]
We show how stratified sampling may be used to enable mixture distributions as the approximate posterior.
We derive a new lower bound on the evidence analogous to the importance weighted autoencoder (IWAE)
arXiv Detail & Related papers (2020-03-03T18:12:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.