Server-Side Stepsizes and Sampling Without Replacement Provably Help in
Federated Optimization
- URL: http://arxiv.org/abs/2201.11066v1
- Date: Wed, 26 Jan 2022 17:26:25 GMT
- Title: Server-Side Stepsizes and Sampling Without Replacement Provably Help in
Federated Optimization
- Authors: Grigory Malinovsky, Konstantin Mishchenko and Peter Richt\'arik
- Abstract summary: We present a theoretical study of federated learning of client updates.
In particular, we prove that whenever localizes are small, one can take an update over all clients.
We also prove that the noise of client sampling can be controlled by using a small server-side stepsize.
- Score: 6.935471115003109
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a theoretical study of server-side optimization in federated
learning. Our results are the first to show that the widely popular heuristic
of scaling the client updates with an extra parameter is very useful in the
context of Federated Averaging (FedAvg) with local passes over the client data.
Each local pass is performed without replacement using Random Reshuffling,
which is a key reason we can show improved complexities. In particular, we
prove that whenever the local stepsizes are small, and the update direction is
given by FedAvg in conjunction with Random Reshuffling over all clients, one
can take a big leap in the obtained direction and improve rates for convex,
strongly convex, and non-convex objectives. In particular, in non-convex regime
we get an enhancement of the rate of convergence from
$\mathcal{O}\left(\varepsilon^{-3}\right)$ to
$\mathcal{O}\left(\varepsilon^{-2}\right)$. This result is new even for Random
Reshuffling performed on a single node. In contrast, if the local stepsizes are
large, we prove that the noise of client sampling can be controlled by using a
small server-side stepsize. To the best of our knowledge, this is the first
time that local steps provably help to overcome the communication bottleneck.
Together, our results on the advantage of large and small server-side stepsizes
give a formal justification for the practice of adaptive server-side
optimization in federated learning. Moreover, we consider a variant of our
algorithm that supports partial client participation, which makes the method
more practical.
Related papers
- Enhanced Federated Optimization: Adaptive Unbiased Client Sampling with Reduced Variance [37.646655530394604]
Federated Learning (FL) is a distributed learning paradigm to train a global model across multiple devices without collecting local data.
We present the first adaptive client sampler, K-Vib, employing an independent sampling procedure.
K-Vib achieves a linear speed-up on the regret bound $tildemathcalObig(Nfrac13Tfrac23/Kfrac43big)$ within a set communication budget.
arXiv Detail & Related papers (2023-10-04T10:08:01Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - Towards Instance-adaptive Inference for Federated Learning [80.38701896056828]
Federated learning (FL) is a distributed learning paradigm that enables multiple clients to learn a powerful global model by aggregating local training.
In this paper, we present a novel FL algorithm, i.e., FedIns, to handle intra-client data heterogeneity by enabling instance-adaptive inference in the FL framework.
Our experiments show that our FedIns outperforms state-of-the-art FL algorithms, e.g., a 6.64% improvement against the top-performing method with less than 15% communication cost on Tiny-ImageNet.
arXiv Detail & Related papers (2023-08-11T09:58:47Z) - Locally Adaptive Federated Learning [30.19411641685853]
Federated learning is a paradigm of distributed machine learning in which multiple clients coordinate with a central server to learn a model.
Standard federated optimization methods such as Federated Averaging (FedAvg) ensure generalization among the clients.
We propose locally federated learning algorithms, that leverage the local geometric information for each client function.
arXiv Detail & Related papers (2023-07-12T17:02:32Z) - FedAvg with Fine Tuning: Local Updates Lead to Representation Learning [54.65133770989836]
Federated Averaging (FedAvg) algorithm consists of alternating between a few local gradient updates at client nodes, followed by a model averaging update at the server.
We show that the reason behind generalizability of the FedAvg's output is its power in learning the common data representation among the clients' tasks.
We also provide empirical evidence demonstrating FedAvg's representation learning ability in federated image classification with heterogeneous data.
arXiv Detail & Related papers (2022-05-27T00:55:24Z) - On Second-order Optimization Methods for Federated Learning [59.787198516188425]
We evaluate the performance of several second-order distributed methods with local steps in the federated learning setting.
We propose a novel variant that uses second-order local information for updates and a global line search to counteract the resulting local specificity.
arXiv Detail & Related papers (2021-09-06T12:04:08Z) - FedChain: Chained Algorithms for Near-Optimal Communication Cost in
Federated Learning [24.812767482563878]
Federated learning (FL) aims to minimize the communication complexity of training a model over heterogeneous data distributed across many clients.
We propose FedChain, an algorithmic framework that combines the strengths of local methods and global methods to achieve fast convergence in terms of R.
arXiv Detail & Related papers (2021-08-16T02:57:06Z) - Efficient Algorithms for Federated Saddle Point Optimization [32.060759921090856]
We consider strongly convex-concave minimax problems in the federated setting, where the communication constraint is the main bottleneck.
Our goal is to design an algorithm that can harness the benefit of similarity in the clients while recovering the Minibatch Mirror-prox performance under arbitrary heterogeneity.
arXiv Detail & Related papers (2021-02-12T02:55:36Z) - A Bayesian Federated Learning Framework with Online Laplace
Approximation [144.7345013348257]
Federated learning allows multiple clients to collaboratively learn a globally shared model.
We propose a novel FL framework that uses online Laplace approximation to approximate posteriors on both the client and server side.
We achieve state-of-the-art results on several benchmarks, clearly demonstrating the advantages of the proposed method.
arXiv Detail & Related papers (2021-02-03T08:36:58Z) - Faster Non-Convex Federated Learning via Global and Local Momentum [57.52663209739171]
textttFedGLOMO is the first (first-order) FLtexttFedGLOMO algorithm.
Our algorithm is provably optimal even with communication between the clients and the server.
arXiv Detail & Related papers (2020-12-07T21:05:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.