Where to Begin? On the Impact of Pre-Training and Initialization in
Federated Learning
- URL: http://arxiv.org/abs/2206.15387v3
- Date: Fri, 24 Mar 2023 19:09:30 GMT
- Title: Where to Begin? On the Impact of Pre-Training and Initialization in
Federated Learning
- Authors: John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael
Rabbat
- Abstract summary: We study the impact of starting from a pre-trained model in federated learning.
Starting from a pre-trained model reduces the training time required to reach a target error rate.
- Score: 18.138078314019737
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: An oft-cited challenge of federated learning is the presence of
heterogeneity. \emph{Data heterogeneity} refers to the fact that data from
different clients may follow very different distributions. \emph{System
heterogeneity} refers to client devices having different system capabilities. A
considerable number of federated optimization methods address this challenge.
In the literature, empirical evaluations usually start federated training from
random initialization. However, in many practical applications of federated
learning, the server has access to proxy data for the training task that can be
used to pre-train a model before starting federated training. Using four
standard federated learning benchmark datasets, we empirically study the impact
of starting from a pre-trained model in federated learning. Unsurprisingly,
starting from a pre-trained model reduces the training time required to reach a
target error rate and enables the training of more accurate models (up to 40\%)
than is possible when starting from random initialization. Surprisingly, we
also find that starting federated learning from a pre-trained initialization
reduces the effect of both data and system heterogeneity. We recommend future
work proposing and evaluating federated optimization methods to evaluate the
performance when starting from random and pre-trained initializations. This
study raises several questions for further work on understanding the role of
heterogeneity in federated optimization. \footnote{Our code is available at:
\url{https://github.com/facebookresearch/where_to_begin}}
Related papers
- Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - Federated Learning for Data Streams [12.856037831335994]
Federated learning (FL) is an effective solution to train machine learning models on the increasing amount of data generated by IoT devices and smartphones.
Most previous work on federated learning assumes that clients operate on static datasets collected before training starts.
We propose a general FL algorithm to learn from data streams through an opportune weighted empirical risk minimization.
arXiv Detail & Related papers (2023-01-04T11:10:48Z) - Where to Begin? On the Impact of Pre-Training and Initialization in
Federated Learning [18.138078314019737]
We study the impact of starting from a pre-trained model in federated learning.
Starting from a pre-trained model reduces the training time required to reach a target error rate.
arXiv Detail & Related papers (2022-10-14T20:25:35Z) - Lottery Tickets on a Data Diet: Finding Initializations with Sparse
Trainable Networks [40.55816472416984]
A striking observation about iterative training (IMP; Frankle et al.) is that $x$ after just a few hundred steps of dense $x2014x2014.
In this work, we seek to understand how this early phase of pre-training leads to good IMP for both the data and the network.
We identify novel properties of the loss landscape dense networks that are predictive of performance.
arXiv Detail & Related papers (2022-06-02T20:04:06Z) - Acceleration of Federated Learning with Alleviated Forgetting in Local
Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy.
We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage.
Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Pretraining Federated Text Models for Next Word Prediction [0.2219120333734152]
We employ the idea of transfer learning to federated training for next word prediction (NWP)
We compare federated training baselines from randomly models to various combinations of pretraining approaches.
We realize lift in performance using pretrained embeddings without exacerbating the number of required training rounds or memory footprint.
arXiv Detail & Related papers (2020-05-11T01:48:50Z) - Unshuffling Data for Improved Generalization [65.57124325257409]
Generalization beyond the training distribution is a core challenge in machine learning.
We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization.
arXiv Detail & Related papers (2020-02-27T03:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.