Related papers: Pretraining Federated Text Models for Next Word Prediction

Pretraining Federated Text Models for Next Word Prediction

URL: http://arxiv.org/abs/2005.04828v3
Date: Mon, 17 Aug 2020 21:51:46 GMT
Title: Pretraining Federated Text Models for Next Word Prediction
Authors: Joel Stremmel and Arjun Singh
Abstract summary: We employ the idea of transfer learning to federated training for next word prediction (NWP) We compare federated training baselines from randomly models to various combinations of pretraining approaches. We realize lift in performance using pretrained embeddings without exacerbating the number of required training rounds or memory footprint.
Score: 0.2219120333734152
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated learning is a decentralized approach for training models on distributed devices, by summarizing local changes and sending aggregate parameters from local models to the cloud rather than the data itself. In this research we employ the idea of transfer learning to federated training for next word prediction (NWP) and conduct a number of experiments demonstrating enhancements to current baselines for which federated NWP models have been successful. Specifically, we compare federated training baselines from randomly initialized models to various combinations of pretraining approaches including pretrained word embeddings and whole model pretraining followed by federated fine tuning for NWP on a dataset of Stack Overflow posts. We realize lift in performance using pretrained embeddings without exacerbating the number of required training rounds or memory footprint. We also observe notable differences using centrally pretrained networks, especially depending on the datasets used. Our research offers effective, yet inexpensive, improvements to federated NWP and paves the way for more rigorous experimentation of transfer learning techniques for federated learning.

Related papers

Probabilistic Federated Prompt-Tuning with Non-IID and Imbalanced Data [35.47385526394076]
Fine-tuning pre-trained models is a popular approach in machine learning for solving complex tasks with moderate data. Fine-tuning the entire pre-trained model is ineffective in federated data scenarios where local data distributions are diversely skewed. Our approach transforms federated learning into a distributed set modeling task, aggregating diverse sets of prompts to globally fine-tune the pre-trained model.
arXiv Detail & Related papers (2025-02-27T04:31:34Z)
Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients. We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z)
Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data. Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z)
Statistical Foundations of Prior-Data Fitted Networks [0.7614628596146599]
Prior-data fitted networks (PFNs) were recently proposed as a new paradigm for machine learning. This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior.
arXiv Detail & Related papers (2023-05-18T16:34:21Z)
Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning [18.138078314019737]
We study the impact of starting from a pre-trained model in federated learning. Starting from a pre-trained model reduces the training time required to reach a target error rate.
arXiv Detail & Related papers (2022-10-14T20:25:35Z)
Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning [18.138078314019737]
We study the impact of starting from a pre-trained model in federated learning. Starting from a pre-trained model reduces the training time required to reach a target error rate.
arXiv Detail & Related papers (2022-06-30T16:18:21Z)
Certified Robustness in Federated Learning [54.03574895808258]
We study the interplay between federated training, personalization, and certified robustness. We find that the simple federated averaging technique is effective in building not only more accurate, but also more certifiably-robust models.
arXiv Detail & Related papers (2022-06-06T12:10:53Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models. We show that the nature of pre-training itself is a performant source of diversity. We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z)
The Lottery Ticket Hypothesis for Pre-trained BERT Networks [137.99328302234338]
In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training. In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matchingworks capable of training in isolation to full accuracy. We combine these observations to assess whether such trainable, transferrableworks exist in pre-trained BERT models.
arXiv Detail & Related papers (2020-07-23T19:35:39Z)
Reinforced Curriculum Learning on Pre-trained Neural Machine Translation Models [20.976165305749777]
We learn a curriculum for improving a pre-trained NMT model by re-selecting influential data samples from the original training set. We propose a data selection framework based on Deterministic Actor-Critic, in which a critic network predicts the expected change of model performance.
arXiv Detail & Related papers (2020-04-13T03:40:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.