Related papers: More Communication Does Not Result in Smaller Generalization Error in Federated Learning

More Communication Does Not Result in Smaller Generalization Error in Federated Learning

URL: http://arxiv.org/abs/2304.12216v2
Date: Thu, 11 May 2023 17:13:59 GMT
Title: More Communication Does Not Result in Smaller Generalization Error in Federated Learning
Authors: Romain Chor, Milad Sefidgaran and Abdellatif Zaidi
Abstract summary: We study the generalization error of statistical learning models in a Federated Learning setting. We consider multiple (say $R in mathbb N*$) rounds of model aggregation and study the effect of $R$ on the generalization error of the final aggregated model.
Score: 9.00236182523638
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the generalization error of statistical learning models in a Federated Learning (FL) setting. Specifically, there are $K$ devices or clients, each holding an independent own dataset of size $n$. Individual models, learned locally via Stochastic Gradient Descent, are aggregated (averaged) by a central server into a global model and then sent back to the devices. We consider multiple (say $R \in \mathbb N^*$) rounds of model aggregation and study the effect of $R$ on the generalization error of the final aggregated model. We establish an upper bound on the generalization error that accounts explicitly for the effect of $R$ (in addition to the number of participating devices $K$ and dataset size $n$). It is observed that, for fixed $(n, K)$, the bound increases with $R$, suggesting that the generalization of such learning algorithms is negatively affected by more frequent communication with the parameter server. Combined with the fact that the empirical risk, however, generally decreases for larger values of $R$, this indicates that $R$ might be a parameter to optimize to reduce the population risk of FL algorithms. The results of this paper, which extend straightforwardly to the heterogeneous data setting, are also illustrated through numerical examples.

Related papers

Generalization Performance of Ensemble Clustering: From Theory to Algorithm [57.176040163699554]
This paper focuses on generalization error, excess risk and consistency in ensemble clustering.<n>By assigning varying weights to finite clusterings, we minimize the error between the empirical average clusterings and their expectation.<n>We instantiate our theory to develop a new ensemble clustering algorithm.
arXiv Detail & Related papers (2025-06-01T09:34:52Z)
Heterogeneity Matters even More in Distributed Learning: Study from Generalization Perspective [14.480713752871523]
In one-round Federated Learning, $K$ clients have each $n$ training samples generated independently according to a possibly different data distribution.<n>We study the effect of the discrepancy between the clients' data distributions on the generalization error of the aggregated model.<n>It is shown that DSVM generalizes better when the dissimilarity between the clients' training samples is bigger.
arXiv Detail & Related papers (2025-03-03T14:33:38Z)
Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization [19.261178173399784]
Learning models have been shown to rely on spurious correlations between non-predictive features and the associated labels in the training data. We quantify the amount of spurious correlations $C$ learned via linear regression, in terms of the data covariance and the strength $lambda$ of the ridge regularization.
arXiv Detail & Related papers (2025-02-03T13:38:42Z)
Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup. We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$. Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z)
Asymptotics of Random Feature Regression Beyond the Linear Scaling Regime [22.666759017118796]
Recent advances in machine learning have been achieved by using overparametrized models trained until near the training data. How does model complexity and generalization depend on the number of parameters $p$? In particular, RFRR exhibits an intuitive trade-off between approximation and generalization power.
arXiv Detail & Related papers (2024-03-13T00:59:25Z)
From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition [64.59093444558549]
We propose a simple, easy-to-implement, two-step training pipeline that we call From Fake to Real. By training on real and synthetic data separately, FFR does not expose the model to the statistical differences between real and synthetic data. Our experiments show that FFR improves worst group accuracy over the state-of-the-art by up to 20% over three datasets.
arXiv Detail & Related papers (2023-08-08T19:52:28Z)
Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often! [15.730667464815548]
We study the evolution of the generalization error with the number of communication rounds $R$ between $K$ clients and a parameter server. We establish PAC-Bayes and rate-distortion theoretic bounds on the generalization error that account explicitly for the effect of the number of rounds $R$. We show that the generalization bound of FSVM increases with $R$, suggesting that more frequent communication with PS diminishes the generalization power.
arXiv Detail & Related papers (2023-06-09T12:53:24Z)
Bias Mimicking: A Simple Sampling Approach for Bias Mitigation [57.17709477668213]
We introduce a new class-conditioned sampling method: Bias Mimicking. Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
arXiv Detail & Related papers (2022-09-30T17:33:00Z)
Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning [9.00236182523638]
In this paper, we use tools from rate-distortion theory to establish new upper bounds on the generalization error of statistical distributed learning algorithms. The bounds depend on the compressibility of each client's algorithm while keeping other clients' algorithms un-compressed.
arXiv Detail & Related papers (2022-06-06T13:21:52Z)
$p$-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses. We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Improving Robustness and Generality of NLP Models Using Disentangled Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$. We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning. We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z)
Inner Ensemble Networks: Average Ensemble as an Effective Regularizer [20.33062212014075]
Inner Ensemble Networks (IENs) reduce the variance within the neural network itself without an increase in the model complexity. IENs utilize ensemble parameters during the training phase to reduce the network variance.
arXiv Detail & Related papers (2020-06-15T11:56:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.