Where to Begin? On the Impact of Pre-Training and Initialization in
Federated Learning
- URL: http://arxiv.org/abs/2210.08090v1
- Date: Fri, 14 Oct 2022 20:25:35 GMT
- Title: Where to Begin? On the Impact of Pre-Training and Initialization in
Federated Learning
- Authors: John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael
Rabbat
- Abstract summary: We study the impact of starting from a pre-trained model in federated learning.
Starting from a pre-trained model reduces the training time required to reach a target error rate.
- Score: 18.138078314019737
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An oft-cited challenge of federated learning is the presence of
heterogeneity. \emph{Data heterogeneity} refers to the fact that data from
different clients may follow very different distributions. \emph{System
heterogeneity} refers to the fact that client devices have different system
capabilities. A considerable number of federated optimization methods address
this challenge. In the literature, empirical evaluations usually start
federated training from random initialization. However, in many practical
applications of federated learning, the server has access to proxy data for the
training task that can be used to pre-train a model before starting federated
training. We empirically study the impact of starting from a pre-trained model
in federated learning using four standard federated learning benchmark
datasets. Unsurprisingly, starting from a pre-trained model reduces the
training time required to reach a target error rate and enables the training of
more accurate models (up to 40\%) than is possible when starting from random
initialization. Surprisingly, we also find that starting federated learning
from a pre-trained initialization reduces the effect of both data and system
heterogeneity. We recommend that future work proposing and evaluating federated
optimization methods evaluate the performance when starting from random and
pre-trained initializations. We also believe this study raises several
questions for further work on understanding the role of heterogeneity in
federated optimization.
Related papers
- Ranking-based Client Selection with Imitation Learning for Efficient Federated Learning [20.412469498888292]
Federated Learning (FL) enables multiple devices to collaboratively train a shared model.
The selection of participating devices in each training round critically affects both the model performance and training efficiency.
We introduce a novel device selection solution called FedRank, which is an end-to-end, ranking-based approach.
arXiv Detail & Related papers (2024-05-07T08:44:29Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - Guiding The Last Layer in Federated Learning with Pre-Trained Models [18.382057374270143]
Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data.
We show that fitting a classification head using the Nearest Class Means (NCM) can be done exactly and orders of magnitude more efficiently than existing proposals.
arXiv Detail & Related papers (2023-06-06T18:02:02Z) - When Do Curricula Work in Federated Learning? [56.88941905240137]
We find that curriculum learning largely alleviates non-IIDness.
The more disparate the data distributions across clients the more they benefit from learning.
We propose a novel client selection technique that benefits from the real-world disparity in the clients.
arXiv Detail & Related papers (2022-12-24T11:02:35Z) - Where to Begin? On the Impact of Pre-Training and Initialization in
Federated Learning [18.138078314019737]
We study the impact of starting from a pre-trained model in federated learning.
Starting from a pre-trained model reduces the training time required to reach a target error rate.
arXiv Detail & Related papers (2022-06-30T16:18:21Z) - Acceleration of Federated Learning with Alleviated Forgetting in Local
Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy.
We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage.
Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z) - FedAUX: Leveraging Unlabeled Auxiliary Data in Federated Learning [14.10627556244287]
Federated Distillation (FD) is a popular novel algorithmic paradigm for Federated Learning.
We propose FedAUX, which drastically improves performance by deriving maximum utility from the unlabeled auxiliary data.
Experiments on large-scale convolutional neural networks and transformer models demonstrate, that the training performance of FedAUX exceeds SOTA FL baseline methods.
arXiv Detail & Related papers (2021-02-04T09:53:53Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.