Related papers: Lifelong Mixture of Variational Autoencoders

Lifelong Mixture of Variational Autoencoders

URL: http://arxiv.org/abs/2107.04694v1
Date: Fri, 9 Jul 2021 22:07:39 GMT
Title: Lifelong Mixture of Variational Autoencoders
Authors: Fei Ye and Adrian G. Bors
Abstract summary: We propose an end-to-end lifelong learning mixture of experts. The experts in the mixture system are jointly trained by maximizing a mixture of individual component evidence lower bounds. The model can learn new tasks fast when these are similar to those previously learnt.
Score: 15.350366047108103
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In this paper, we propose an end-to-end lifelong learning mixture of experts. Each expert is implemented by a Variational Autoencoder (VAE). The experts in the mixture system are jointly trained by maximizing a mixture of individual component evidence lower bounds (MELBO) on the log-likelihood of the given training samples. The mixing coefficients in the mixture, control the contributions of each expert in the goal representation. These are sampled from a Dirichlet distribution whose parameters are determined through non-parametric estimation during lifelong learning. The model can learn new tasks fast when these are similar to those previously learnt. The proposed Lifelong mixture of VAE (L-MVAE) expands its architecture with new components when learning a completely new task. After the training, our model can automatically determine the relevant expert to be used when fed with new data samples. This mechanism benefits both the memory efficiency and the required computational cost as only one expert is used during the inference. The L-MVAE inference model is able to perform interpolation in the joint latent space across the data domains associated with different tasks and is shown to be efficient for disentangled learning representation.

Related papers

Amortized Bayesian Mixture Models [1.3976439685325095]
This paper introduces a novel extension of Amortized Bayesian Inference (ABI) tailored to mixture models. We factorize the posterior into a distribution of the parameters and a distribution of (categorical) mixture indicators, which allows us to use a combination of generative neural networks. The proposed framework accommodates both independent and dependent mixture models, enabling filtering and smoothing.
arXiv Detail & Related papers (2025-01-17T14:51:03Z)
Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging [36.0133566024214]
Upcycling Instruction Tuning (UpIT) is a data-efficient approach for tuning a dense pre-trained model into a MoE instruction model. To ensure each specialized expert in the MoE model works as expected, we select a small amount of seed data that each expert excels to pre-optimize the router.
arXiv Detail & Related papers (2024-10-02T14:48:22Z)
Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection [63.96018203905272]
We propose to reduce the sampling cost by pruning a pretrained diffusion model into a mixture of efficient experts. We demonstrate the effectiveness of our method, DiffPruning, across several datasets.
arXiv Detail & Related papers (2024-09-23T21:27:26Z)
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts [75.85448576746373]
We propose a method of grouping and pruning similar experts to improve the model's parameter efficiency. We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures. The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.
arXiv Detail & Related papers (2024-07-12T17:25:02Z)
Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts [104.9871176044644]
Masked Autoencoder(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. We propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE) MoCE trains each expert only with semantically relevant images by using cluster-conditional gates.
arXiv Detail & Related papers (2024-02-08T03:46:32Z)
Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients. We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z)
Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network. We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling. We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z)
Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process [15.350366047108103]
Recent research efforts in lifelong learning propose to grow a mixture of models to adapt to an increasing number of tasks. We perform the theoretical analysis for lifelong learning models by deriving the risk bounds based on the discrepancy distance between the probabilistic representation of data. Inspired by the theoretical analysis, we introduce a new lifelong learning approach, namely the Lifelong Infinite Mixture (LIMix) model.
arXiv Detail & Related papers (2021-08-25T21:06:20Z)
Federated Mixture of Experts [94.25278695272874]
FedMix is a framework that allows us to train an ensemble of specialized models. We show that users with similar data characteristics select the same members and therefore share statistical strength.
arXiv Detail & Related papers (2021-07-14T14:15:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.