Initialization Matters: Unraveling the Impact of Pre-Training on Federated Learning
- URL: http://arxiv.org/abs/2502.08024v1
- Date: Tue, 11 Feb 2025 23:53:16 GMT
- Title: Initialization Matters: Unraveling the Impact of Pre-Training on Federated Learning
- Authors: Divyansh Jhunjhunwala, Pranay Sharma, Zheng Xu, Gauri Joshi,
- Abstract summary: Initializing with pre-trained models is becoming standard practice in machine learning.
We study the class of two-layer convolutional neural networks (CNNs) and provide bounds on the training error convergence and test error of such a network trained with FedAvg.
- Score: 21.440470901377182
- License:
- Abstract: Initializing with pre-trained models when learning on downstream tasks is becoming standard practice in machine learning. Several recent works explore the benefits of pre-trained initialization in a federated learning (FL) setting, where the downstream training is performed at the edge clients with heterogeneous data distribution. These works show that starting from a pre-trained model can substantially reduce the adverse impact of data heterogeneity on the test performance of a model trained in a federated setting, with no changes to the standard FedAvg training algorithm. In this work, we provide a deeper theoretical understanding of this phenomenon. To do so, we study the class of two-layer convolutional neural networks (CNNs) and provide bounds on the training error convergence and test error of such a network trained with FedAvg. We introduce the notion of aligned and misaligned filters at initialization and show that the data heterogeneity only affects learning on misaligned filters. Starting with a pre-trained model typically results in fewer misaligned filters at initialization, thus producing a lower test error even when the model is trained in a federated setting with data heterogeneity. Experiments in synthetic settings and practical FL training on CNNs verify our theoretical findings.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Statistical Foundations of Prior-Data Fitted Networks [0.7614628596146599]
Prior-data fitted networks (PFNs) were recently proposed as a new paradigm for machine learning.
This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior.
arXiv Detail & Related papers (2023-05-18T16:34:21Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Where to Begin? On the Impact of Pre-Training and Initialization in
Federated Learning [18.138078314019737]
We study the impact of starting from a pre-trained model in federated learning.
Starting from a pre-trained model reduces the training time required to reach a target error rate.
arXiv Detail & Related papers (2022-10-14T20:25:35Z) - Efficient Augmentation for Imbalanced Deep Learning [8.38844520504124]
We study a convolutional neural network's internal representation of imbalanced image data.
We measure the generalization gap between a model's feature embeddings in the training and test sets, showing that the gap is wider for minority classes.
This insight enables us to design an efficient three-phase CNN training framework for imbalanced data.
arXiv Detail & Related papers (2022-07-13T09:43:17Z) - Where to Begin? On the Impact of Pre-Training and Initialization in
Federated Learning [18.138078314019737]
We study the impact of starting from a pre-trained model in federated learning.
Starting from a pre-trained model reduces the training time required to reach a target error rate.
arXiv Detail & Related papers (2022-06-30T16:18:21Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Benign Overfitting without Linearity: Neural Network Classifiers Trained
by Gradient Descent for Noisy Linear Data [44.431266188350655]
We consider the generalization error of two-layer neural networks trained to generalize by gradient descent.
We show that neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error.
In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
arXiv Detail & Related papers (2022-02-11T23:04:00Z) - Learning from Failure: Training Debiased Classifier from Biased
Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge.
We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously.
Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.