Related papers: Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning

Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning

URL: http://arxiv.org/abs/2206.02604v1
Date: Mon, 6 Jun 2022 13:21:52 GMT
Title: Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning
Authors: Milad Sefidgaran, Romain Chor, Abdellatif Zaidi
Abstract summary: In this paper, we use tools from rate-distortion theory to establish new upper bounds on the generalization error of statistical distributed learning algorithms. The bounds depend on the compressibility of each client's algorithm while keeping other clients' algorithms un-compressed.
Score: 9.00236182523638
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we use tools from rate-distortion theory to establish new upper bounds on the generalization error of statistical distributed learning algorithms. Specifically, there are $K$ clients whose individually chosen models are aggregated by a central server. The bounds depend on the compressibility of each client's algorithm while keeping other clients' algorithms un-compressed, and leverage the fact that small changes in each local model change the aggregated model by a factor of only $1/K$. Adopting a recently proposed approach by Sefidgaran et al., and extending it suitably to the distributed setting, this enables smaller rate-distortion terms which are shown to translate into tighter generalization bounds. The bounds are then applied to the distributed support vector machines (SVM), suggesting that the generalization error of the distributed setting decays faster than that of the centralized one with a factor of $\mathcal{O}(\log(K)/\sqrt{K})$. This finding is validated also experimentally. A similar conclusion is obtained for a multiple-round federated learning setup where each client uses stochastic gradient Langevin dynamics (SGLD).

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Heterogeneity Matters even More in Distributed Learning: Study from Generalization Perspective [14.480713752871523]
(K) clients have each (n) training samples generated independently according to a possibly different data distribution. We study the effect of discrepancy between the clients' data distributions on the generalization error of the aggregated model. It is shown that the bound gets smaller as the degree of data heterogeneity across clients gets higher.
arXiv Detail & Related papers (2025-03-03T14:33:38Z)
Smoothed Normalization for Efficient Distributed Private Optimization [54.197255548244705]
Federated learning enables machine learning models with privacy of participants. There is no differentially private distributed method for training, non-feedback problems. We introduce a new distributed algorithm $alpha$-$sf NormEC$ with provable convergence guarantees.
arXiv Detail & Related papers (2025-02-19T07:10:32Z)
Distributed Extra-gradient with Optimal Complexity and Communication Guarantees [60.571030754252824]
We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local dual vectors. Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient. We propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs.
arXiv Detail & Related papers (2023-08-17T21:15:04Z)
Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning [9.31522898261934]
We investigate the impact of compression on gradient algorithms for machine learning. We highlight differences in terms of convergence rates between several unbiased compression operators. We extend our results to the case of federated learning.
arXiv Detail & Related papers (2023-08-02T18:02:00Z)
Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards [14.822625665220068]
We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The goal is to minimize the overall regret of the entire system through collaborations.
arXiv Detail & Related papers (2023-06-08T22:37:24Z)
FedHB: Hierarchical Bayesian Federated Learning [11.936836827864095]
We propose a novel hierarchical Bayesian approach to Federated Learning (FL) Our model reasonably describes the generative process of clients' local data via hierarchical Bayesian modeling. We show that our block-coordinate FL algorithm converges to an optimum of the objective at the rate of $O(sqrtt)$.
arXiv Detail & Related papers (2023-05-08T18:21:41Z)
Beyond ADMM: A Unified Client-variance-reduced Adaptive Federated Learning Framework [82.36466358313025]
We propose a primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model. Experiments based on (semi-supervised) image classification tasks demonstrate superiority of FedVRA over the existing schemes.
arXiv Detail & Related papers (2022-12-03T03:27:51Z)
Federated Learning Using Variance Reduced Stochastic Gradient for Probabilistically Activated Agents [0.0]
This paper proposes an algorithm for Federated Learning (FL) with a two-layer structure that achieves both variance reduction and a faster convergence rate to an optimal solution in the setting where each agent has an arbitrary probability of selection in each iteration.
arXiv Detail & Related papers (2022-10-25T22:04:49Z)
Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers [16.618918548497223]
We propose a new covering technique localized for the trajectories of SGD. This localization provides an algorithm-specific clustering measured by the bounds number. We derive these results in various contexts and improve the known state-of-the-art label rates.
arXiv Detail & Related papers (2022-09-19T12:11:07Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms. We develop a general framework from the perspective of Bregman minimization divergence. We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z)
Faster Non-Convex Federated Learning via Global and Local Momentum [57.52663209739171]
textttFedGLOMO is the first (first-order) FLtexttFedGLOMO algorithm. Our algorithm is provably optimal even with communication between the clients and the server.
arXiv Detail & Related papers (2020-12-07T21:05:31Z)
Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization. We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.