Related papers: Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning

Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning

URL: http://arxiv.org/abs/2206.15387v3
Date: Fri, 24 Mar 2023 19:09:30 GMT
Title: Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning
Authors: John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael Rabbat
Abstract summary: We study the impact of starting from a pre-trained model in federated learning. Starting from a pre-trained model reduces the training time required to reach a target error rate.
Score: 18.138078314019737
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: An oft-cited challenge of federated learning is the presence of heterogeneity. \emph{Data heterogeneity} refers to the fact that data from different clients may follow very different distributions. \emph{System heterogeneity} refers to client devices having different system capabilities. A considerable number of federated optimization methods address this challenge. In the literature, empirical evaluations usually start federated training from random initialization. However, in many practical applications of federated learning, the server has access to proxy data for the training task that can be used to pre-train a model before starting federated training. Using four standard federated learning benchmark datasets, we empirically study the impact of starting from a pre-trained model in federated learning. Unsurprisingly, starting from a pre-trained model reduces the training time required to reach a target error rate and enables the training of more accurate models (up to 40\%) than is possible when starting from random initialization. Surprisingly, we also find that starting federated learning from a pre-trained initialization reduces the effect of both data and system heterogeneity. We recommend future work proposing and evaluating federated optimization methods to evaluate the performance when starting from random and pre-trained initializations. This study raises several questions for further work on understanding the role of heterogeneity in federated optimization. \footnote{Our code is available at: \url{https://github.com/facebookresearch/where_to_begin}}

Related papers

Probabilistic Federated Prompt-Tuning with Non-IID and Imbalanced Data [35.47385526394076]
Fine-tuning pre-trained models is a popular approach in machine learning for solving complex tasks with moderate data. Fine-tuning the entire pre-trained model is ineffective in federated data scenarios where local data distributions are diversely skewed. Our approach transforms federated learning into a distributed set modeling task, aggregating diverse sets of prompts to globally fine-tune the pre-trained model.
arXiv Detail & Related papers (2025-02-27T04:31:34Z)
Initialization Matters: Unraveling the Impact of Pre-Training on Federated Learning [21.440470901377182]
Initializing with pre-trained models is becoming standard practice in machine learning. We study the class of two-layer convolutional neural networks (CNNs) and provide bounds on the training error convergence and test error of such a network trained with FedAvg.
arXiv Detail & Related papers (2025-02-11T23:53:16Z)
Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients. We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z)
FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method. We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate. We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z)
Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data. Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z)
Federated Learning for Data Streams [12.856037831335994]
Federated learning (FL) is an effective solution to train machine learning models on the increasing amount of data generated by IoT devices and smartphones. Most previous work on federated learning assumes that clients operate on static datasets collected before training starts. We propose a general FL algorithm to learn from data streams through an opportune weighted empirical risk minimization.
arXiv Detail & Related papers (2023-01-04T11:10:48Z)
Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning [18.138078314019737]
We study the impact of starting from a pre-trained model in federated learning. Starting from a pre-trained model reduces the training time required to reach a target error rate.
arXiv Detail & Related papers (2022-10-14T20:25:35Z)
Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks [40.55816472416984]
A striking observation about iterative training (IMP; Frankle et al.) is that $x$ after just a few hundred steps of dense $x2014x2014. In this work, we seek to understand how this early phase of pre-training leads to good IMP for both the data and the network. We identify novel properties of the loss landscape dense networks that are predictive of performance.
arXiv Detail & Related papers (2022-06-02T20:04:06Z)
Acceleration of Federated Learning with Alleviated Forgetting in Local Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy. We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage. Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
Pretraining Federated Text Models for Next Word Prediction [0.2219120333734152]
We employ the idea of transfer learning to federated training for next word prediction (NWP) We compare federated training baselines from randomly models to various combinations of pretraining approaches. We realize lift in performance using pretrained embeddings without exacerbating the number of required training rounds or memory footprint.
arXiv Detail & Related papers (2020-05-11T01:48:50Z)
Unshuffling Data for Improved Generalization [65.57124325257409]
Generalization beyond the training distribution is a core challenge in machine learning. We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization.
arXiv Detail & Related papers (2020-02-27T03:07:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.