Related papers: Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization

Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization

URL: http://arxiv.org/abs/2411.16303v1
Date: Mon, 25 Nov 2024 11:43:22 GMT
Title: Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization
Authors: Dun Zeng, Zheshun Wu, Shiyu Liu, Yu Pan, Xiaoying Tang, Zenglin Xu,
Abstract summary: Federated Learning (FL) is a distributed learning approach that trains neural networks across multiple devices. FL often faces challenges due to data heterogeneity, leading to inconsistent local optima among clients. We introduce the first generalization dynamics analysis framework in federated optimization.
Score: 22.577751005038543
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Federated Learning (FL) is a distributed learning approach that trains neural networks across multiple devices while keeping their local data private. However, FL often faces challenges due to data heterogeneity, leading to inconsistent local optima among clients. These inconsistencies can cause unfavorable convergence behavior and generalization performance degradation. Existing studies mainly describe this issue through \textit{convergence analysis}, focusing on how well a model fits training data, or through \textit{algorithmic stability}, which examines the generalization gap. However, neither approach precisely captures the generalization performance of FL algorithms, especially for neural networks. In this paper, we introduce the first generalization dynamics analysis framework in federated optimization, highlighting the trade-offs between model stability and optimization. Through this framework, we show how the generalization of FL algorithms is affected by the interplay of algorithmic stability and optimization. This framework applies to standard federated optimization and its advanced versions, like server momentum. We find that fast convergence from large local steps or accelerated momentum enlarges stability but obtains better generalization performance. Our insights into these trade-offs can guide the practice of future algorithms for better generalization.

Related papers

Feasible Learning [78.6167929413604]
We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bounds the loss for each training sample. Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.
arXiv Detail & Related papers (2025-01-24T20:39:38Z)
Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z)
Aiding Global Convergence in Federated Learning via Local Perturbation and Mutual Similarity Information [6.767885381740953]
Federated learning has emerged as a distributed optimization paradigm. We propose a novel modified framework wherein each client locally performs a perturbed gradient step. We show that our algorithm speeds convergence up to a margin of 30 global rounds compared with FedAvg.
arXiv Detail & Related papers (2024-10-07T23:14:05Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification [51.04894019092156]
Federated learning (FL) has been recognized as a rapidly growing area, where the model is trained over clients under the FL orchestration (PS) In this paper, we propose a novel primal sparification algorithm for and guarantee non-smooth FL problems. Its unique insightful properties and its analyses are also presented.
arXiv Detail & Related papers (2023-10-30T14:15:47Z)
FedNAR: Federated Optimization with Normalized Annealing Regularization [54.42032094044368]
We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
arXiv Detail & Related papers (2023-10-04T21:11:40Z)
Understanding Generalization of Federated Learning via Stability: Heterogeneity Matters [1.4502611532302039]
Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications. Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications.
arXiv Detail & Related papers (2023-06-06T16:12:35Z)
Entropy-driven Fair and Effective Federated Learning [26.22014904183881]
Federated Learning (FL) enables collaborative model training across distributed devices while preserving data privacy.<n>We propose a novel that leverages Theoretical-based aggregation combined with model and gradient alignments to simultaneously optimize and global model performance.
arXiv Detail & Related papers (2023-01-29T10:02:42Z)
Annealing Optimization for Progressive Learning with Stochastic Approximation [0.0]
We introduce a learning model designed to meet the needs of applications in which computational resources are limited. We develop an online prototype-based learning algorithm that is formulated as an online-free gradient approximation algorithm. The learning model can be viewed as an interpretable and progressively growing competitive neural network model to be used for supervised, unsupervised, and reinforcement learning.
arXiv Detail & Related papers (2022-09-06T21:31:01Z)
On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point. Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z)
Accelerated Federated Learning with Decoupled Adaptive Optimization [53.230515878096426]
federated learning (FL) framework enables clients to collaboratively learn a shared model while keeping privacy of training data on clients. Recently, many iterations efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings. This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs)
arXiv Detail & Related papers (2022-07-14T22:46:43Z)
Generalized Federated Learning via Sharpness Aware Minimization [22.294290071999736]
We propose a general, effective algorithm, textttFedSAM, based on Sharpness Aware Minimization (SAM) local, and develop a momentum FL algorithm to bridge local and global models. Empirically, our proposed algorithms substantially outperform existing FL studies and significantly decrease the learning deviation.
arXiv Detail & Related papers (2022-06-06T13:54:41Z)
Contextual Model Aggregation for Fast and Robust Federated Learning in Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge. Existing algorithms face issues with slow convergence and/or robustness of performance. We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z)
On the Convergence of Heterogeneous Federated Learning with Arbitrary Adaptive Online Model Pruning [15.300983585090794]
We present a unifying framework for heterogeneous FL algorithms with em arbitrary adaptive online model pruning. In particular, we prove that under certain sufficient conditions, these algorithms converge to a stationary point of standard FL for general smooth cost functions. We illuminate two key factors impacting convergence: pruning-induced noise and minimum coverage index.
arXiv Detail & Related papers (2022-01-27T20:43:38Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
Decentralized Statistical Inference with Unrolled Graph Neural Networks [26.025935320024665]
We propose a learning-based framework, which unrolls decentralized optimization algorithms into graph neural networks (GNNs) By minimizing the recovery error via end-to-end training, this learning-based framework resolves the model mismatch issue. Our convergence analysis reveals that the learned model parameters may accelerate the convergence and reduce the recovery error to a large extent.
arXiv Detail & Related papers (2021-04-04T07:52:34Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.