Distributed Learning of Mixtures of Experts
- URL: http://arxiv.org/abs/2312.09877v1
- Date: Fri, 15 Dec 2023 15:26:13 GMT
- Title: Distributed Learning of Mixtures of Experts
- Authors: Fa\"icel Chamroukhi, Nhat Thien Pham
- Abstract summary: We deal with datasets that are either distributed by nature or potentially large for which distributing the computations is usually a standard way to proceed.
We propose a distributed learning approach for mixtures of experts (MoE) models with an aggregation strategy to construct a reduction estimator from local estimators fitted parallelly to distributed subsets of the data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In modern machine learning problems we deal with datasets that are either
distributed by nature or potentially large for which distributing the
computations is usually a standard way to proceed, since centralized algorithms
are in general ineffective. We propose a distributed learning approach for
mixtures of experts (MoE) models with an aggregation strategy to construct a
reduction estimator from local estimators fitted parallelly to distributed
subsets of the data. The aggregation is based on an optimal minimization of an
expected transportation divergence between the large MoE composed of local
estimators and the unknown desired MoE model. We show that the provided
reduction estimator is consistent as soon as the local estimators to be
aggregated are consistent, and its construction is performed by a proposed
majorization-minimization (MM) algorithm that is computationally effective. We
study the statistical and numerical properties for the proposed reduction
estimator on experiments that demonstrate its performance compared to namely
the global estimator constructed in a centralized way from the full dataset.
For some situations, the computation time is more than ten times faster, for a
comparable performance. Our source codes are publicly available on Github.
Related papers
- Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Distributionally Robust Machine Learning with Multi-source Data [6.383451076043423]
We introduce a group distributionally robust prediction model to optimize an adversarial reward about explained variance with respect to a class of target distributions.
Compared to classical empirical risk minimization, the proposed robust prediction model improves the prediction accuracy for target populations with distribution shifts.
We demonstrate the performance of our proposed group distributionally robust method on simulated and real data with random forests and neural networks as base-learning algorithms.
arXiv Detail & Related papers (2023-09-05T13:19:40Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Distributed Nonparametric Estimation under Communication Constraints [0.0]
We provide a general framework for understanding the behavior of distributed estimation under communication constraints.
We derive minimax lower and matching upper bounds in the distributed regression, density estimation, classification, Poisson regression and volatility estimation models.
arXiv Detail & Related papers (2022-04-21T19:04:50Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent [7.6860514640178]
We propose a novel zeroth-order optimization algorithm for distributed reinforcement learning.
It allows each agent to estimate its local gradient by cost evaluation independently, without use of any consensus protocol.
arXiv Detail & Related papers (2021-07-26T18:11:07Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z) - Distributed Learning of Finite Gaussian Mixtures [21.652015112462]
We study split-and-conquer approaches for the distributed learning of finite Gaussian mixtures.
New estimator is shown to be consistent and retains root-n consistency under some general conditions.
Experiments based on simulated and real-world data show that the proposed split-and-conquer approach has comparable statistical performance with the global estimator.
arXiv Detail & Related papers (2020-10-20T16:17:47Z) - Distributionally Robust Local Non-parametric Conditional Estimation [22.423052432220235]
We propose a new distributionally robust estimator that generates non-parametric local estimates.
We show that despite being generally intractable, the local estimator can be efficiently found via convex optimization.
Experiments with synthetic and MNIST datasets show the competitive performance of this new class of estimators.
arXiv Detail & Related papers (2020-10-12T00:11:17Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.