Training Production Language Models without Memorizing User Data
- URL: http://arxiv.org/abs/2009.10031v1
- Date: Mon, 21 Sep 2020 17:12:33 GMT
- Title: Training Production Language Models without Memorizing User Data
- Authors: Swaroop Ramaswamy, Om Thakkar, Rajiv Mathews, Galen Andrew, H. Brendan
McMahan, Fran\c{c}oise Beaufays
- Abstract summary: This paper presents the first consumer-scale next-word prediction (NWP) model trained with Federated Learning (FL)
We demonstrate the deployment of a differentially private mechanism for the training of a production neural network in FL.
- Score: 7.004279935788177
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents the first consumer-scale next-word prediction (NWP) model
trained with Federated Learning (FL) while leveraging the Differentially
Private Federated Averaging (DP-FedAvg) technique. There has been prior work on
building practical FL infrastructure, including work demonstrating the
feasibility of training language models on mobile devices using such
infrastructure. It has also been shown (in simulations on a public corpus) that
it is possible to train NWP models with user-level differential privacy using
the DP-FedAvg algorithm. Nevertheless, training production-quality NWP models
with DP-FedAvg in a real-world production environment on a heterogeneous fleet
of mobile phones requires addressing numerous challenges. For instance, the
coordinating central server has to keep track of the devices available at the
start of each round and sample devices uniformly at random from them, while
ensuring \emph{secrecy of the sample}, etc. Unlike all prior privacy-focused FL
work of which we are aware, for the first time we demonstrate the deployment of
a differentially private mechanism for the training of a production neural
network in FL, as well as the instrumentation of the production training
infrastructure to perform an end-to-end empirical measurement of unintended
memorization.
Related papers
- Mitigating Noise Detriment in Differentially Private Federated Learning with Model Pre-training [27.1846697092374]
Pre-training exploits public datasets to pre-train an advanced machine learning model.
We are the first to explore how model pre-training can mitigate noise detriment in differentially private federated learning.
arXiv Detail & Related papers (2024-08-18T13:48:10Z) - A Survey on Efficient Federated Learning Methods for Foundation Model Training [62.473245910234304]
Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients.
In the wake of Foundation Models (FM), the reality is different for many deep learning applications.
We discuss the benefits and drawbacks of parameter-efficient fine-tuning (PEFT) for FL applications.
arXiv Detail & Related papers (2024-01-09T10:22:23Z) - Tunable Soft Prompts are Messengers in Federated Learning [55.924749085481544]
Federated learning (FL) enables multiple participants to collaboratively train machine learning models using decentralized data sources.
The lack of model privacy protection in FL becomes an unneglectable challenge.
We propose a novel FL training approach that accomplishes information exchange among participants via tunable soft prompts.
arXiv Detail & Related papers (2023-11-12T11:01:10Z) - Can Public Large Language Models Help Private Cross-device Federated Learning? [58.05449579773249]
We study (differentially) private federated learning (FL) of language models.
Public data has been used to improve privacy-utility trade-offs for both large and small language models.
We propose a novel distribution matching algorithm with theoretical grounding to sample public data close to private data distribution.
arXiv Detail & Related papers (2023-05-20T07:55:58Z) - Federated Nearest Neighbor Machine Translation [66.8765098651988]
In this paper, we propose a novel federated nearest neighbor (FedNN) machine translation framework.
FedNN leverages one-round memorization-based interaction to share knowledge across different clients.
Experiments show that FedNN significantly reduces computational and communication costs compared with FedAvg.
arXiv Detail & Related papers (2023-02-23T18:04:07Z) - Test-Time Robust Personalization for Federated Learning [5.553167334488855]
Federated Learning (FL) is a machine learning paradigm where many clients collaboratively learn a shared global model with decentralized training data.
Personalized FL additionally adapts the global model to different clients, achieving promising results on consistent local training and test distributions.
We propose Federated Test-time Head Ensemble plus tuning(FedTHE+), which personalizes FL models with robustness to various test-time distribution shifts.
arXiv Detail & Related papers (2022-05-22T20:08:14Z) - Differentially private federated deep learning for multi-site medical
image segmentation [56.30543374146002]
Collaborative machine learning techniques such as federated learning (FL) enable the training of models on effectively larger datasets without data transfer.
Recent initiatives have demonstrated that segmentation models trained with FL can achieve performance similar to locally trained models.
However, FL is not a fully privacy-preserving technique and privacy-centred attacks can disclose confidential patient data.
arXiv Detail & Related papers (2021-07-06T12:57:32Z) - FLaaS: Federated Learning as a Service [3.128267020893596]
We present Federated Learning as a Service (FL), a system enabling different scenarios of 3rd-party application collaborative model building.
As a proof of concept, we implement it on a mobile phone setting and discuss practical implications of results on simulated and real devices.
We demonstrate FL's feasibility in building unique or joint FL models across applications for image object detection in a few hours, across 100 devices.
arXiv Detail & Related papers (2020-11-18T15:56:22Z) - End-to-end spoken language understanding using transformer networks and
self-supervised pre-trained features [17.407912171579852]
Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP)
We introduce a modular End-to-End (E2E) SLU transformer network based architecture which allows the use of self-supervised pre-trained acoustic features.
arXiv Detail & Related papers (2020-11-16T19:30:52Z) - UVeQFed: Universal Vector Quantization for Federated Learning [179.06583469293386]
Federated learning (FL) is an emerging approach to train such learning models without requiring the users to share their possibly private labeled data.
In FL, each user trains its copy of the learning model locally. The server then collects the individual updates and aggregates them into a global model.
We show that combining universal vector quantization methods with FL yields a decentralized training system in which the compression of the trained models induces only a minimum distortion.
arXiv Detail & Related papers (2020-06-05T07:10:22Z) - Pretraining Federated Text Models for Next Word Prediction [0.2219120333734152]
We employ the idea of transfer learning to federated training for next word prediction (NWP)
We compare federated training baselines from randomly models to various combinations of pretraining approaches.
We realize lift in performance using pretrained embeddings without exacerbating the number of required training rounds or memory footprint.
arXiv Detail & Related papers (2020-05-11T01:48:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.