FedYolo: Augmenting Federated Learning with Pretrained Transformers
- URL: http://arxiv.org/abs/2307.04905v1
- Date: Mon, 10 Jul 2023 21:08:52 GMT
- Title: FedYolo: Augmenting Federated Learning with Pretrained Transformers
- Authors: Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K.
Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak
- Abstract summary: In this work, we investigate pretrained transformers (PTF) to achieve on-device learning goals.
We show that larger scale shrinks the accuracy gaps between alternative approaches and improves robustness.
Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF.
- Score: 61.56476056444933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growth and diversity of machine learning applications motivate a
rethinking of learning with mobile and edge devices. How can we address diverse
client goals and learn with scarce heterogeneous data? While federated learning
aims to address these issues, it has challenges hindering a unified solution.
Large transformer models have been shown to work across a variety of tasks
achieving remarkable few-shot adaptation. This raises the question: Can clients
use a single general-purpose model, rather than custom models for each task,
while obeying device and network constraints? In this work, we investigate
pretrained transformers (PTF) to achieve these on-device learning goals and
thoroughly explore the roles of model size and modularity, where the latter
refers to adaptation through modules such as prompts or adapters. Focusing on
federated learning, we demonstrate that: (1) Larger scale shrinks the accuracy
gaps between alternative approaches and improves heterogeneity robustness.
Scale allows clients to run more local SGD epochs which can significantly
reduce the number of communication rounds. At the extreme, clients can achieve
respectable accuracy locally highlighting the potential of fully-local
learning. (2) Modularity, by design, enables $>$100$\times$ less communication
in bits. Surprisingly, it also boosts the generalization capability of local
adaptation methods and the robustness of smaller PTFs. Finally, it enables
clients to solve multiple unrelated tasks simultaneously using a single PTF,
whereas full updates are prone to catastrophic forgetting. These insights on
scale and modularity motivate a new federated learning approach we call "You
Only Load Once" (FedYolo): The clients load a full PTF model once and all
future updates are accomplished through communication-efficient modules with
limited catastrophic-forgetting, where each task is assigned to its own module.
Related papers
- Collaborative and Efficient Personalization with Mixtures of Adaptors [5.195669033269619]
We propose a parameter-efficient framework to tackle multi-task learning problems.
We call our framework Federated Low-Rank Adaptive Learning (FLoRAL)
We show promising experimental results on synthetic datasets and real-world federated multi-task problems.
arXiv Detail & Related papers (2024-10-04T15:11:15Z) - SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models [71.78800549517298]
Continual learning (CL) ability is vital for deploying large language models (LLMs) in the dynamic world.
Existing methods devise the learning module to acquire task-specific knowledge with parameter-efficient tuning (PET) block and the selection module to pick out the corresponding one for the testing input.
We propose a novel Shared Attention Framework (SAPT) to align the PET learning and selection via the Shared Attentive Learning & Selection module.
arXiv Detail & Related papers (2024-01-16T11:45:03Z) - FedBone: Towards Large-Scale Federated Multi-Task Learning [13.835972363413884]
In real-world applications, visual and natural language tasks typically require large-scale models to extract high-level abstract features.
Existing HFML methods disregard the impact of gradient conflicts on multi-task optimization.
We propose an innovative framework called FedBone, which enables the construction of large-scale models with better generalization.
arXiv Detail & Related papers (2023-06-30T08:19:38Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Meta Knowledge Condensation for Federated Learning [65.20774786251683]
Existing federated learning paradigms usually extensively exchange distributed models at a central solver to achieve a more powerful model.
This would incur severe communication burden between a server and multiple clients especially when data distributions are heterogeneous.
Unlike existing paradigms, we introduce an alternative perspective to significantly decrease the communication cost in federate learning.
arXiv Detail & Related papers (2022-09-29T15:07:37Z) - No One Left Behind: Inclusive Federated Learning over Heterogeneous
Devices [79.16481453598266]
We propose InclusiveFL, a client-inclusive federated learning method to handle this problem.
The core idea of InclusiveFL is to assign models of different sizes to clients with different computing capabilities.
We also propose an effective method to share the knowledge among multiple local models with different sizes.
arXiv Detail & Related papers (2022-02-16T13:03:27Z) - Comfetch: Federated Learning of Large Networks on Constrained Clients
via Sketching [28.990067638230254]
Federated learning (FL) is a popular paradigm for private and collaborative model training on the edge.
We propose a novel algorithm, Comdirectional, which allows clients to train large networks using representations of the global neural network.
arXiv Detail & Related papers (2021-09-17T04:48:42Z) - Federated Few-Shot Learning with Adversarial Learning [30.905239262227]
We propose a few-shot learning framework to learn a few-shot classification model that can classify unseen data classes with only a few labeled samples.
We show our approaches outperform baselines by more than 10% in learning vision tasks and 5% in language tasks.
arXiv Detail & Related papers (2021-04-01T09:44:57Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.