Related papers: Understanding Generalization of Federated Learning via Stability: Heterogeneity Matters

Understanding Generalization of Federated Learning via Stability: Heterogeneity Matters

URL: http://arxiv.org/abs/2306.03824v1
Date: Tue, 6 Jun 2023 16:12:35 GMT
Title: Understanding Generalization of Federated Learning via Stability: Heterogeneity Matters
Authors: Zhenyu Sun, Xiaochun Niu, Ermin Wei
Abstract summary: Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications. Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications.
Score: 1.4502611532302039
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications. Good generalization indicates the model can predict unseen data correctly when trained under a limited number of data. Federated learning (FL), which has emerged as a popular distributed learning framework, allows multiple devices or clients to train a shared model without violating privacy requirements. While the existing literature has studied extensively the generalization performances of centralized machine learning algorithms, similar analysis in the federated settings is either absent or with very restrictive assumptions on the loss functions. In this paper, we aim to analyze the generalization performances of federated learning by means of algorithmic stability, which measures the change of the output model of an algorithm when perturbing one data point. Three widely-used algorithms are studied, including FedAvg, SCAFFOLD, and FedProx, under convex and non-convex loss functions. Our analysis shows that the generalization performances of models trained by these three algorithms are closely related to the heterogeneity of clients' datasets as well as the convergence behaviors of the algorithms. Particularly, in the i.i.d. setting, our results recover the classical results of stochastic gradient descent (SGD).

Related papers

Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis [44.468416523840965]
We develop a theory of algorithm-dependent generalisation for high-dimensional diffusion models.<n>We derive generalisation bounds in terms of score stability, and apply our framework to several fundamental learning settings.
arXiv Detail & Related papers (2025-07-04T18:07:06Z)
Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization [22.577751005038543]
Federated Learning (FL) is a distributed learning approach that trains neural networks across multiple devices. FL often faces challenges due to data heterogeneity, leading to inconsistent local optima among clients. We introduce the first generalization dynamics analysis framework in federated optimization.
arXiv Detail & Related papers (2024-11-25T11:43:22Z)
On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling on a continuous domain for the data prediction task of (multimodal) self-supervised representation learning. We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning. We propose a novel non-parametric method for approximating the sum of conditional probability densities required by MIS.
arXiv Detail & Related papers (2024-10-11T18:02:46Z)
On the KL-Divergence-based Robust Satisficing Model [2.425685918104288]
robustness satisficing framework has attracted increasing attention from academia. We present analytical interpretations, diverse performance guarantees, efficient and stable numerical methods, convergence analysis, and an extension tailored for hierarchical data structures. We demonstrate the superior performance of our model compared to state-of-the-art benchmarks.
arXiv Detail & Related papers (2024-08-17T10:05:05Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Proof of Swarm Based Ensemble Learning for Federated Learning Applications [3.2536767864585663]
In federated learning it is not feasible to apply centralised ensemble learning directly due to privacy concerns. Most distributed consensus algorithms, such as Byzantine fault tolerance (BFT), do not normally perform well in such applications. We propose PoSw, a novel distributed consensus algorithm for ensemble learning in a federated setting.
arXiv Detail & Related papers (2022-12-28T13:53:34Z)
FedGen: Generalizable Federated Learning for Sequential Data [8.784435748969806]
In many real-world distributed settings, spurious correlations exist due to biases and data sampling issues. We present a generalizable federated learning framework called FedGen, which allows clients to identify and distinguish between spurious and invariant features. We show that FedGen results in models that achieve significantly better generalization and can outperform the accuracy of current federated learning approaches by over 24%.
arXiv Detail & Related papers (2022-11-03T15:48:14Z)
Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test. We train a variational inference model to predict the causal structure from observational/interventional data. Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z)
DRFLM: Distributionally Robust Federated Learning with Inter-client Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data. We propose a general framework to solve the above two challenges simultaneously. We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Modeling Generalization in Machine Learning: A Methodological and Computational Study [0.8057006406834467]
We use the concept of the convex hull of the training data in assessing machine learning generalization. We observe unexpectedly weak associations between the generalization ability of machine learning models and all metrics related to dimensionality.
arXiv Detail & Related papers (2020-06-28T19:06:16Z)
On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.