More Communication Does Not Result in Smaller Generalization Error in
Federated Learning
- URL: http://arxiv.org/abs/2304.12216v2
- Date: Thu, 11 May 2023 17:13:59 GMT
- Title: More Communication Does Not Result in Smaller Generalization Error in
Federated Learning
- Authors: Romain Chor, Milad Sefidgaran and Abdellatif Zaidi
- Abstract summary: We study the generalization error of statistical learning models in a Federated Learning setting.
We consider multiple (say $R in mathbb N*$) rounds of model aggregation and study the effect of $R$ on the generalization error of the final aggregated model.
- Score: 9.00236182523638
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the generalization error of statistical learning models in a
Federated Learning (FL) setting. Specifically, there are $K$ devices or
clients, each holding an independent own dataset of size $n$. Individual
models, learned locally via Stochastic Gradient Descent, are aggregated
(averaged) by a central server into a global model and then sent back to the
devices. We consider multiple (say $R \in \mathbb N^*$) rounds of model
aggregation and study the effect of $R$ on the generalization error of the
final aggregated model. We establish an upper bound on the generalization error
that accounts explicitly for the effect of $R$ (in addition to the number of
participating devices $K$ and dataset size $n$). It is observed that, for fixed
$(n, K)$, the bound increases with $R$, suggesting that the generalization of
such learning algorithms is negatively affected by more frequent communication
with the parameter server. Combined with the fact that the empirical risk,
however, generally decreases for larger values of $R$, this indicates that $R$
might be a parameter to optimize to reduce the population risk of FL
algorithms. The results of this paper, which extend straightforwardly to the
heterogeneous data setting, are also illustrated through numerical examples.
Related papers
- Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization [19.261178173399784]
Learning models have been shown to rely on spurious correlations between non-predictive features and the associated labels in the training data.
We quantify the amount of spurious correlations $C$ learned via linear regression, in terms of the data covariance and the strength $lambda$ of the ridge regularization.
arXiv Detail & Related papers (2025-02-03T13:38:42Z) - Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.
We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.
Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - Asymptotics of Random Feature Regression Beyond the Linear Scaling
Regime [22.666759017118796]
Recent advances in machine learning have been achieved by using overparametrized models trained until near the training data.
How does model complexity and generalization depend on the number of parameters $p$?
In particular, RFRR exhibits an intuitive trade-off between approximation and generalization power.
arXiv Detail & Related papers (2024-03-13T00:59:25Z) - From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition [64.59093444558549]
We propose a simple, easy-to-implement, two-step training pipeline that we call From Fake to Real.
By training on real and synthetic data separately, FFR does not expose the model to the statistical differences between real and synthetic data.
Our experiments show that FFR improves worst group accuracy over the state-of-the-art by up to 20% over three datasets.
arXiv Detail & Related papers (2023-08-08T19:52:28Z) - Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often! [15.730667464815548]
We study the evolution of the generalization error with the number of communication rounds $R$ between $K$ clients and a parameter server.
We establish PAC-Bayes and rate-distortion theoretic bounds on the generalization error that account explicitly for the effect of the number of rounds $R$.
We show that the generalization bound of FSVM increases with $R$, suggesting that more frequent communication with PS diminishes the generalization power.
arXiv Detail & Related papers (2023-06-09T12:53:24Z) - Bias Mimicking: A Simple Sampling Approach for Bias Mitigation [57.17709477668213]
We introduce a new class-conditioned sampling method: Bias Mimicking.
Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
arXiv Detail & Related papers (2022-09-30T17:33:00Z) - Rate-Distortion Theoretic Bounds on Generalization Error for Distributed
Learning [9.00236182523638]
In this paper, we use tools from rate-distortion theory to establish new upper bounds on the generalization error of statistical distributed learning algorithms.
The bounds depend on the compressibility of each client's algorithm while keeping other clients' algorithms un-compressed.
arXiv Detail & Related papers (2022-06-06T13:21:52Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Improving Robustness and Generality of NLP Models Using Disentangled
Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$.
We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning.
We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z) - Inner Ensemble Networks: Average Ensemble as an Effective Regularizer [20.33062212014075]
Inner Ensemble Networks (IENs) reduce the variance within the neural network itself without an increase in the model complexity.
IENs utilize ensemble parameters during the training phase to reduce the network variance.
arXiv Detail & Related papers (2020-06-15T11:56:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.