Where to Begin? On the Impact of Pre-Training and Initialization in
Federated Learning
- URL: http://arxiv.org/abs/2210.08090v1
- Date: Fri, 14 Oct 2022 20:25:35 GMT
- Title: Where to Begin? On the Impact of Pre-Training and Initialization in
Federated Learning
- Authors: John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael
Rabbat
- Abstract summary: We study the impact of starting from a pre-trained model in federated learning.
Starting from a pre-trained model reduces the training time required to reach a target error rate.
- Score: 18.138078314019737
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An oft-cited challenge of federated learning is the presence of
heterogeneity. \emph{Data heterogeneity} refers to the fact that data from
different clients may follow very different distributions. \emph{System
heterogeneity} refers to the fact that client devices have different system
capabilities. A considerable number of federated optimization methods address
this challenge. In the literature, empirical evaluations usually start
federated training from random initialization. However, in many practical
applications of federated learning, the server has access to proxy data for the
training task that can be used to pre-train a model before starting federated
training. We empirically study the impact of starting from a pre-trained model
in federated learning using four standard federated learning benchmark
datasets. Unsurprisingly, starting from a pre-trained model reduces the
training time required to reach a target error rate and enables the training of
more accurate models (up to 40\%) than is possible when starting from random
initialization. Surprisingly, we also find that starting federated learning
from a pre-trained initialization reduces the effect of both data and system
heterogeneity. We recommend that future work proposing and evaluating federated
optimization methods evaluate the performance when starting from random and
pre-trained initializations. We also believe this study raises several
questions for further work on understanding the role of heterogeneity in
federated optimization.
Related papers
- Accurate Forgetting for Heterogeneous Federated Continual Learning [89.08735771893608]
We propose a new concept accurate forgetting (AF) and develop a novel generative-replay methodMethodwhich selectively utilizes previous knowledge in federated networks.
We employ a probabilistic framework based on a normalizing flow model to quantify the credibility of previous knowledge.
arXiv Detail & Related papers (2025-02-20T02:35:17Z) - Initialization Matters: Unraveling the Impact of Pre-Training on Federated Learning [21.440470901377182]
Initializing with pre-trained models is becoming standard practice in machine learning.
We study the class of two-layer convolutional neural networks (CNNs) and provide bounds on the training error convergence and test error of such a network trained with FedAvg.
arXiv Detail & Related papers (2025-02-11T23:53:16Z) - Ranking-based Client Selection with Imitation Learning for Efficient Federated Learning [20.412469498888292]
Federated Learning (FL) enables multiple devices to collaboratively train a shared model.
The selection of participating devices in each training round critically affects both the model performance and training efficiency.
We introduce a novel device selection solution called FedRank, which is an end-to-end, ranking-based approach.
arXiv Detail & Related papers (2024-05-07T08:44:29Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - Guiding The Last Layer in Federated Learning with Pre-Trained Models [18.382057374270143]
Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data.
We show that fitting a classification head using the Nearest Class Means (NCM) can be done exactly and orders of magnitude more efficiently than existing proposals.
arXiv Detail & Related papers (2023-06-06T18:02:02Z) - When Do Curricula Work in Federated Learning? [56.88941905240137]
We find that curriculum learning largely alleviates non-IIDness.
The more disparate the data distributions across clients the more they benefit from learning.
We propose a novel client selection technique that benefits from the real-world disparity in the clients.
arXiv Detail & Related papers (2022-12-24T11:02:35Z) - Where to Begin? On the Impact of Pre-Training and Initialization in
Federated Learning [18.138078314019737]
We study the impact of starting from a pre-trained model in federated learning.
Starting from a pre-trained model reduces the training time required to reach a target error rate.
arXiv Detail & Related papers (2022-06-30T16:18:21Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.