On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists
- URL: http://arxiv.org/abs/2409.13931v2
- Date: Tue, 1 Oct 2024 21:18:07 GMT
- Title: On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists
- Authors: Dongyang Fan, Bettina Messmer, Martin Jaggi,
- Abstract summary: We propose a novel $textbfCo$llaborative learning approach with a $textbfMi$xture of $textbfG$eneralists and $textbfS$pecialists (CoMiGS)
Our approach distinguishes generalists and specialists by aggregating certain experts across end users while keeping others localized to specialize in user-specific datasets.
- Score: 33.68104398807581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On-device LLMs have gained increasing attention for their ability to enhance privacy and provide a personalized user experience. To facilitate learning with private and scarce local data, federated learning has become a standard approach, though it introduces challenges related to system and data heterogeneity among end users. As a solution, we propose a novel $\textbf{Co}$llaborative learning approach with a $\textbf{Mi}$xture of $\textbf{G}$eneralists and $\textbf{S}$pecialists (CoMiGS), being the first to effectively address both. Our approach distinguishes generalists and specialists by aggregating certain experts across end users while keeping others localized to specialize in user-specific datasets. A key innovation of our method is the bi-level optimization formulation of the Mixture-of-Experts learning objective, where the router is updated using a separate validation set that represents the target distribution. CoMiGS effectively balances collaboration and personalization, as demonstrated by its superior performance in scenarios with high data heterogeneity across multiple datasets. By design, our approach accommodates users' varying computational resources through different numbers of specialists. By decoupling resource abundance from data quantity, CoMiGS remains robust against overfitting-due to the generalists' regularizing effect-while adapting to local data through specialist expertise.
Related papers
- Personalized Federated Learning for Cross-view Geo-localization [49.40531019551957]
We propose a methodology combining Federated Learning (FL) with Cross-view Image Geo-localization (CVGL) techniques.
Our method implements a coarse-to-fine approach, where clients share only the coarse feature extractors while keeping fine-grained features specific to local environments.
Results demonstrate that our federated CVGL method achieves performance close to centralized training while maintaining data privacy.
arXiv Detail & Related papers (2024-11-07T13:25:52Z) - Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach [49.63614966954833]
Federated Collaborative Filtering (FedCF) is an emerging field focused on developing a new recommendation framework with preserving privacy.
Existing FedCF methods typically combine distributed Collaborative Filtering (CF) algorithms with privacy-preserving mechanisms, and then preserve personalized information into a user embedding vector.
This paper proposes a novel personalized FedCF method by preserving users' personalized information into a latent variable and a neural model simultaneously.
arXiv Detail & Related papers (2024-08-16T05:49:14Z) - Personalized Federated Learning via Amortized Bayesian Meta-Learning [21.126405589760367]
We introduce a new perspective on personalized federated learning through Amortized Bayesian Meta-Learning.
Specifically, we propose a novel algorithm called emphFedABML, which employs hierarchical variational inference across clients.
Our theoretical analysis provides an upper bound on the average generalization error and guarantees the generalization performance on unseen data.
arXiv Detail & Related papers (2023-07-05T11:58:58Z) - FedJETs: Efficient Just-In-Time Personalization with Federated Mixture
of Experts [48.78037006856208]
FedJETs is a novel solution by using a Mixture-of-Experts (MoE) framework within a Federated Learning (FL) setup.
Our method leverages the diversity of the clients to train specialized experts on different subsets of classes, and a gating function to route the input to the most relevant expert(s)
Our approach can improve accuracy up to 18% in state of the art FL settings, while maintaining competitive zero-shot performance.
arXiv Detail & Related papers (2023-06-14T15:47:52Z) - Straggler-Resilient Personalized Federated Learning [55.54344312542944]
Federated learning allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions.
We develop a novel algorithmic procedure with theoretical speedup guarantees that simultaneously handles two of these hurdles.
Our method relies on ideas from representation learning theory to find a global common representation using all clients' data and learn a user-specific set of parameters leading to a personalized solution for each client.
arXiv Detail & Related papers (2022-06-05T01:14:46Z) - Personalization Improves Privacy-Accuracy Tradeoffs in Federated
Optimization [57.98426940386627]
We show that coordinating local learning with private centralized learning yields a generically useful and improved tradeoff between accuracy and privacy.
We illustrate our theoretical results with experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-02-10T20:44:44Z) - Differentially Private Federated Learning on Heterogeneous Data [10.431137628048356]
Federated Learning (FL) is a paradigm for large-scale distributed learning.
It faces two key challenges: (i) efficient training from highly heterogeneous user data, and (ii) protecting the privacy of participating users.
We propose a novel FL approach to tackle these two challenges together by incorporating Differential Privacy (DP) constraints.
arXiv Detail & Related papers (2021-11-17T18:23:49Z) - Linear Speedup in Personalized Collaborative Learning [69.45124829480106]
Personalization in federated learning can improve the accuracy of a model for a user by trading off the model's bias.
We formalize the personalized collaborative learning problem as optimization of a user's objective.
We explore conditions under which we can optimally trade-off their bias for a reduction in variance.
arXiv Detail & Related papers (2021-11-10T22:12:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.