Pretraining Federated Text Models for Next Word Prediction
- URL: http://arxiv.org/abs/2005.04828v3
- Date: Mon, 17 Aug 2020 21:51:46 GMT
- Title: Pretraining Federated Text Models for Next Word Prediction
- Authors: Joel Stremmel and Arjun Singh
- Abstract summary: We employ the idea of transfer learning to federated training for next word prediction (NWP)
We compare federated training baselines from randomly models to various combinations of pretraining approaches.
We realize lift in performance using pretrained embeddings without exacerbating the number of required training rounds or memory footprint.
- Score: 0.2219120333734152
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning is a decentralized approach for training models on
distributed devices, by summarizing local changes and sending aggregate
parameters from local models to the cloud rather than the data itself. In this
research we employ the idea of transfer learning to federated training for next
word prediction (NWP) and conduct a number of experiments demonstrating
enhancements to current baselines for which federated NWP models have been
successful. Specifically, we compare federated training baselines from randomly
initialized models to various combinations of pretraining approaches including
pretrained word embeddings and whole model pretraining followed by federated
fine tuning for NWP on a dataset of Stack Overflow posts. We realize lift in
performance using pretrained embeddings without exacerbating the number of
required training rounds or memory footprint. We also observe notable
differences using centrally pretrained networks, especially depending on the
datasets used. Our research offers effective, yet inexpensive, improvements to
federated NWP and paves the way for more rigorous experimentation of transfer
learning techniques for federated learning.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - Statistical Foundations of Prior-Data Fitted Networks [0.7614628596146599]
Prior-data fitted networks (PFNs) were recently proposed as a new paradigm for machine learning.
This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior.
arXiv Detail & Related papers (2023-05-18T16:34:21Z) - Where to Begin? On the Impact of Pre-Training and Initialization in
Federated Learning [18.138078314019737]
We study the impact of starting from a pre-trained model in federated learning.
Starting from a pre-trained model reduces the training time required to reach a target error rate.
arXiv Detail & Related papers (2022-10-14T20:25:35Z) - Where to Begin? On the Impact of Pre-Training and Initialization in
Federated Learning [18.138078314019737]
We study the impact of starting from a pre-trained model in federated learning.
Starting from a pre-trained model reduces the training time required to reach a target error rate.
arXiv Detail & Related papers (2022-06-30T16:18:21Z) - Certified Robustness in Federated Learning [54.03574895808258]
We study the interplay between federated training, personalization, and certified robustness.
We find that the simple federated averaging technique is effective in building not only more accurate, but also more certifiably-robust models.
arXiv Detail & Related papers (2022-06-06T12:10:53Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z) - The Lottery Ticket Hypothesis for Pre-trained BERT Networks [137.99328302234338]
In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training.
In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matchingworks capable of training in isolation to full accuracy.
We combine these observations to assess whether such trainable, transferrableworks exist in pre-trained BERT models.
arXiv Detail & Related papers (2020-07-23T19:35:39Z) - Reinforced Curriculum Learning on Pre-trained Neural Machine Translation
Models [20.976165305749777]
We learn a curriculum for improving a pre-trained NMT model by re-selecting influential data samples from the original training set.
We propose a data selection framework based on Deterministic Actor-Critic, in which a critic network predicts the expected change of model performance.
arXiv Detail & Related papers (2020-04-13T03:40:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.