Related papers: Worldwide Federated Training of Language Models

Worldwide Federated Training of Language Models

URL: http://arxiv.org/abs/2405.14446v2
Date: Mon, 27 May 2024 10:59:22 GMT
Title: Worldwide Federated Training of Language Models
Authors: Alex Iacob, Lorenzo Sani, Bill Marino, Preslav Aleksandrov, William F. Shen, Nicholas Donald Lane,
Abstract summary: We propose a Worldwide Federated Language Model Training(WorldLM) system based on federations of federations. We show that WorldLM outperforms standard federations by up to $1.91times$, approaches the personalized performance of fully local models, and maintains these advantages under privacy-enhancing techniques.
Score: 4.259910812836157
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The reliance of language model training on massive amounts of computation and vast datasets scraped from potentially low-quality, copyrighted, or sensitive data has come into question practically, legally, and ethically. Federated learning provides a plausible alternative by enabling previously untapped data to be voluntarily gathered from collaborating organizations. However, when scaled globally, federated learning requires collaboration across heterogeneous legal, security, and privacy regimes while accounting for the inherent locality of language data; this further exacerbates the established challenge of federated statistical heterogeneity. We propose a Worldwide Federated Language Model Training~(WorldLM) system based on federations of federations, where each federation has the autonomy to account for factors such as its industry, operating jurisdiction, or competitive environment. WorldLM enables such autonomy in the presence of statistical heterogeneity via partial model localization by allowing sub-federations to attentively aggregate key layers from their constituents. Furthermore, it can adaptively share information across federations via residual layer embeddings. Evaluations of language modeling on naturally heterogeneous datasets show that WorldLM outperforms standard federations by up to $1.91\times$, approaches the personalized performance of fully local models, and maintains these advantages under privacy-enhancing techniques.

Related papers

Learning Critically: Selective Self Distillation in Federated Learning on Non-IID Data [17.624808621195978]
We propose a Selective Self-Distillation method for Federated learning (FedSSD) FedSSD imposes adaptive constraints on the local updates by self-distilling the global model's knowledge. It achieves better generalization and robustness in fewer communication rounds, compared with other state-of-the-art FL methods.
arXiv Detail & Related papers (2025-04-20T18:06:55Z)
DAMe: Personalized Federated Social Event Detection with Dual Aggregation Mechanism [55.45581907514175]
This paper proposes a personalized federated learning framework with a dual aggregation mechanism for social event detection, namely DAMe. We introduce a global aggregation strategy to provide clients with maximum external knowledge of their preferences. In addition, we incorporate a global-local event-centric constraint to prevent local overfitting and client-drift''
arXiv Detail & Related papers (2024-09-01T04:56:41Z)
Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data [10.64629029156029]
We introduce an innovative personalized Federated Learning framework, Multi-level Personalized Federated Learning (MuPFL) MuPFL integrates three pivotal modules: Biased Activation Value Dropout (BAVD), Adaptive Cluster-based Model Update (ACMU) and Prior Knowledge-assisted Fine-tuning (PKCF) Experiments on diverse real-world datasets show that MuPFL consistently outperforms state-of-the-art baselines, even under extreme non-i.i.d. and long-tail conditions.
arXiv Detail & Related papers (2024-05-10T11:52:53Z)
FedLoGe: Joint Local and Generic Federated Learning under Long-tailed Data [46.29190753993415]
Federated Long-Tailed Learning (Fed-LT) is a paradigm wherein data collected from decentralized local clients manifests a globally prevalent long-tailed distribution. This paper introduces an approach termed Federated Local and Generic Model Training in Fed-LT (FedLoGe), which enhances both local and generic model performance.
arXiv Detail & Related papers (2024-01-17T05:04:33Z)
Fed-QSSL: A Framework for Personalized Federated Learning under Bitwidth and Data Heterogeneity [14.313847382199059]
Federated quantization-based self-supervised learning scheme (Fed-QSSL) designed to address heterogeneity in FL systems. Fed-QSSL deploys de-quantization, weighted aggregation and re-quantization, ultimately creating models personalized to both data distribution and specific infrastructure of each client's device.
arXiv Detail & Related papers (2023-12-20T19:11:19Z)
Tunable Soft Prompts are Messengers in Federated Learning [55.924749085481544]
Federated learning (FL) enables multiple participants to collaboratively train machine learning models using decentralized data sources. The lack of model privacy protection in FL becomes an unneglectable challenge. We propose a novel FL training approach that accomplishes information exchange among participants via tunable soft prompts.
arXiv Detail & Related papers (2023-11-12T11:01:10Z)
Generalizable Heterogeneous Federated Cross-Correlation and Instance Similarity Learning [60.058083574671834]
This paper presents a novel FCCL+, federated correlation and similarity learning with non-target distillation. For heterogeneous issue, we leverage irrelevant unlabeled public data for communication. For catastrophic forgetting in local updating stage, FCCL+ introduces Federated Non Target Distillation.
arXiv Detail & Related papers (2023-09-28T09:32:27Z)
FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry. We propose FedDM to build the global training objective from multiple local surrogate functions. In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z)
Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning [86.59588262014456]
Federated Learning (FL) is an emerging distributed learning paradigm under privacy constraint. We propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG) Our FedFTG significantly outperforms the state-of-the-art (SOTA) FL algorithms and can serve as a strong plugin for enhancing FedAvg, FedProx, FedDyn, and SCAFFOLD.
arXiv Detail & Related papers (2022-03-17T11:18:17Z)
Preservation of the Global Knowledge by Not-True Self Knowledge Distillation in Federated Learning [8.474470736998136]
In Federated Learning (FL), a strong global model is collaboratively learned by aggregating the clients' locally trained models. We observe that fitting on biased local distribution shifts the feature on global distribution and results in forgetting of global knowledge. We propose a simple yet effective framework Federated Local Self-Distillation (FedLSD), which utilizes the global knowledge on locally available data.
arXiv Detail & Related papers (2021-06-06T11:51:47Z)
FedH2L: Federated Learning with Model and Statistical Heterogeneity [75.61234545520611]
Federated learning (FL) enables distributed participants to collectively learn a strong global model without sacrificing their individual data privacy. We introduce FedH2L, which is agnostic to both the model architecture and robust to different data distributions across participants. In contrast to approaches sharing parameters or gradients, FedH2L relies on mutual distillation, exchanging only posteriors on a shared seed set between participants in a decentralized manner.
arXiv Detail & Related papers (2021-01-27T10:10:18Z)
Federated Learning of a Mixture of Global and Local Models [10.279748604797911]
We propose a new optimization formulation for training federated learning models. We show that local steps can improve communication for problems with heterogeneous data. In particular, we are the first to i) show that local steps can improve communication for problems with heterogeneous data, and ii) point out that personalization yields reduced communication complexity.
arXiv Detail & Related papers (2020-02-10T09:17:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.