Lifelong Mixture of Variational Autoencoders
- URL: http://arxiv.org/abs/2107.04694v1
- Date: Fri, 9 Jul 2021 22:07:39 GMT
- Title: Lifelong Mixture of Variational Autoencoders
- Authors: Fei Ye and Adrian G. Bors
- Abstract summary: We propose an end-to-end lifelong learning mixture of experts.
The experts in the mixture system are jointly trained by maximizing a mixture of individual component evidence lower bounds.
The model can learn new tasks fast when these are similar to those previously learnt.
- Score: 15.350366047108103
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we propose an end-to-end lifelong learning mixture of experts.
Each expert is implemented by a Variational Autoencoder (VAE). The experts in
the mixture system are jointly trained by maximizing a mixture of individual
component evidence lower bounds (MELBO) on the log-likelihood of the given
training samples. The mixing coefficients in the mixture, control the
contributions of each expert in the goal representation. These are sampled from
a Dirichlet distribution whose parameters are determined through non-parametric
estimation during lifelong learning. The model can learn new tasks fast when
these are similar to those previously learnt. The proposed Lifelong mixture of
VAE (L-MVAE) expands its architecture with new components when learning a
completely new task. After the training, our model can automatically determine
the relevant expert to be used when fed with new data samples. This mechanism
benefits both the memory efficiency and the required computational cost as only
one expert is used during the inference. The L-MVAE inference model is able to
perform interpolation in the joint latent space across the data domains
associated with different tasks and is shown to be efficient for disentangled
learning representation.
Related papers
- Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging [36.0133566024214]
Upcycling Instruction Tuning (UpIT) is a data-efficient approach for tuning a dense pre-trained model into a MoE instruction model.
To ensure each specialized expert in the MoE model works as expected, we select a small amount of seed data that each expert excels to pre-optimize the router.
arXiv Detail & Related papers (2024-10-02T14:48:22Z) - Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection [63.96018203905272]
We propose to reduce the sampling cost by pruning a pretrained diffusion model into a mixture of efficient experts.
We demonstrate the effectiveness of our method, DiffPruning, across several datasets.
arXiv Detail & Related papers (2024-09-23T21:27:26Z) - Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts [75.85448576746373]
We propose a method of grouping and pruning similar experts to improve the model's parameter efficiency.
We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures.
The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.
arXiv Detail & Related papers (2024-07-12T17:25:02Z) - Task-customized Masked AutoEncoder via Mixture of Cluster-conditional
Experts [104.9871176044644]
Masked Autoencoder(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training.
We propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE)
MoCE trains each expert only with semantically relevant images by using cluster-conditional gates.
arXiv Detail & Related papers (2024-02-08T03:46:32Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network.
We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z) - Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet
Process [15.350366047108103]
Recent research efforts in lifelong learning propose to grow a mixture of models to adapt to an increasing number of tasks.
We perform the theoretical analysis for lifelong learning models by deriving the risk bounds based on the discrepancy distance between the probabilistic representation of data.
Inspired by the theoretical analysis, we introduce a new lifelong learning approach, namely the Lifelong Infinite Mixture (LIMix) model.
arXiv Detail & Related papers (2021-08-25T21:06:20Z) - Federated Mixture of Experts [94.25278695272874]
FedMix is a framework that allows us to train an ensemble of specialized models.
We show that users with similar data characteristics select the same members and therefore share statistical strength.
arXiv Detail & Related papers (2021-07-14T14:15:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.