Related papers: Training Production Language Models without Memorizing User Data

Training Production Language Models without Memorizing User Data

URL: http://arxiv.org/abs/2009.10031v1
Date: Mon, 21 Sep 2020 17:12:33 GMT
Title: Training Production Language Models without Memorizing User Data
Authors: Swaroop Ramaswamy, Om Thakkar, Rajiv Mathews, Galen Andrew, H. Brendan McMahan, Fran\c{c}oise Beaufays
Abstract summary: This paper presents the first consumer-scale next-word prediction (NWP) model trained with Federated Learning (FL) We demonstrate the deployment of a differentially private mechanism for the training of a production neural network in FL.
Score: 7.004279935788177
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper presents the first consumer-scale next-word prediction (NWP) model trained with Federated Learning (FL) while leveraging the Differentially Private Federated Averaging (DP-FedAvg) technique. There has been prior work on building practical FL infrastructure, including work demonstrating the feasibility of training language models on mobile devices using such infrastructure. It has also been shown (in simulations on a public corpus) that it is possible to train NWP models with user-level differential privacy using the DP-FedAvg algorithm. Nevertheless, training production-quality NWP models with DP-FedAvg in a real-world production environment on a heterogeneous fleet of mobile phones requires addressing numerous challenges. For instance, the coordinating central server has to keep track of the devices available at the start of each round and sample devices uniformly at random from them, while ensuring \emph{secrecy of the sample}, etc. Unlike all prior privacy-focused FL work of which we are aware, for the first time we demonstrate the deployment of a differentially private mechanism for the training of a production neural network in FL, as well as the instrumentation of the production training infrastructure to perform an end-to-end empirical measurement of unintended memorization.

Related papers

Private Federated Learning In Real World Application -- A Case Study [15.877427073033184]
This paper presents an implementation of machine learning model training using private federated learning (PFL) on edge devices. We introduce a novel framework that uses PFL to address the challenge of training a model using users' private data. The framework ensures that user data remain on individual devices, with only essential model updates transmitted to a central server for aggregation with privacy guarantees.
arXiv Detail & Related papers (2025-02-06T23:38:50Z)
Mitigating Noise Detriment in Differentially Private Federated Learning with Model Pre-training [27.1846697092374]
Pre-training exploits public datasets to pre-train an advanced machine learning model. We are the first to explore how model pre-training can mitigate noise detriment in differentially private federated learning.
arXiv Detail & Related papers (2024-08-18T13:48:10Z)
Personalized Wireless Federated Learning for Large Language Models [75.22457544349668]
Large language models (LLMs) have driven profound transformations in wireless networks.<n>Within wireless environments, the training of LLMs faces significant challenges related to security and privacy.<n>This paper presents a systematic analysis of the training stages of LLMs in wireless networks, including pre-training, instruction tuning, and alignment tuning.
arXiv Detail & Related papers (2024-04-20T02:30:21Z)
A Survey on Efficient Federated Learning Methods for Foundation Model Training [62.473245910234304]
Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients. In the wake of Foundation Models (FM), the reality is different for many deep learning applications. We discuss the benefits and drawbacks of parameter-efficient fine-tuning (PEFT) for FL applications.
arXiv Detail & Related papers (2024-01-09T10:22:23Z)
Tunable Soft Prompts are Messengers in Federated Learning [55.924749085481544]
Federated learning (FL) enables multiple participants to collaboratively train machine learning models using decentralized data sources. The lack of model privacy protection in FL becomes an unneglectable challenge. We propose a novel FL training approach that accomplishes information exchange among participants via tunable soft prompts.
arXiv Detail & Related papers (2023-11-12T11:01:10Z)
Can Public Large Language Models Help Private Cross-device Federated Learning? [58.05449579773249]
We study (differentially) private federated learning (FL) of language models. Public data has been used to improve privacy-utility trade-offs for both large and small language models. We propose a novel distribution matching algorithm with theoretical grounding to sample public data close to private data distribution.
arXiv Detail & Related papers (2023-05-20T07:55:58Z)
Federated Nearest Neighbor Machine Translation [66.8765098651988]
In this paper, we propose a novel federated nearest neighbor (FedNN) machine translation framework. FedNN leverages one-round memorization-based interaction to share knowledge across different clients. Experiments show that FedNN significantly reduces computational and communication costs compared with FedAvg.
arXiv Detail & Related papers (2023-02-23T18:04:07Z)
Test-Time Robust Personalization for Federated Learning [5.553167334488855]
Federated Learning (FL) is a machine learning paradigm where many clients collaboratively learn a shared global model with decentralized training data. Personalized FL additionally adapts the global model to different clients, achieving promising results on consistent local training and test distributions. We propose Federated Test-time Head Ensemble plus tuning(FedTHE+), which personalizes FL models with robustness to various test-time distribution shifts.
arXiv Detail & Related papers (2022-05-22T20:08:14Z)
Differentially private federated deep learning for multi-site medical image segmentation [56.30543374146002]
Collaborative machine learning techniques such as federated learning (FL) enable the training of models on effectively larger datasets without data transfer. Recent initiatives have demonstrated that segmentation models trained with FL can achieve performance similar to locally trained models. However, FL is not a fully privacy-preserving technique and privacy-centred attacks can disclose confidential patient data.
arXiv Detail & Related papers (2021-07-06T12:57:32Z)
FLaaS: Federated Learning as a Service [3.128267020893596]
We present Federated Learning as a Service (FL), a system enabling different scenarios of 3rd-party application collaborative model building. As a proof of concept, we implement it on a mobile phone setting and discuss practical implications of results on simulated and real devices. We demonstrate FL's feasibility in building unique or joint FL models across applications for image object detection in a few hours, across 100 devices.
arXiv Detail & Related papers (2020-11-18T15:56:22Z)
End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features [17.407912171579852]
Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP) We introduce a modular End-to-End (E2E) SLU transformer network based architecture which allows the use of self-supervised pre-trained acoustic features.
arXiv Detail & Related papers (2020-11-16T19:30:52Z)
UVeQFed: Universal Vector Quantization for Federated Learning [179.06583469293386]
Federated learning (FL) is an emerging approach to train such learning models without requiring the users to share their possibly private labeled data. In FL, each user trains its copy of the learning model locally. The server then collects the individual updates and aggregates them into a global model. We show that combining universal vector quantization methods with FL yields a decentralized training system in which the compression of the trained models induces only a minimum distortion.
arXiv Detail & Related papers (2020-06-05T07:10:22Z)
Pretraining Federated Text Models for Next Word Prediction [0.2219120333734152]
We employ the idea of transfer learning to federated training for next word prediction (NWP) We compare federated training baselines from randomly models to various combinations of pretraining approaches. We realize lift in performance using pretrained embeddings without exacerbating the number of required training rounds or memory footprint.
arXiv Detail & Related papers (2020-05-11T01:48:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.