Bias-Variance Reduced Local SGD for Less Heterogeneous Federated
Learning
- URL: http://arxiv.org/abs/2102.03198v1
- Date: Fri, 5 Feb 2021 14:32:28 GMT
- Title: Bias-Variance Reduced Local SGD for Less Heterogeneous Federated
Learning
- Authors: Tomoya Murata, Taiji Suzuki
- Abstract summary: We aim at learning local efficiently in terms of communication and computational complexity.
One of the important learning scenarios in distributed learning is the Federated Learning scenario.
- Score: 46.32232395989181
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning is one of the important learning scenarios in distributed
learning, in which we aim at learning heterogeneous local datasets efficiently
in terms of communication and computational cost. In this paper, we study new
local algorithms called Bias-Variance Reduced Local SGD (BVR-L-SGD) for
nonconvex federated learning. One of the novelties of this paper is in the
analysis of our bias and variance reduced local gradient estimators which fully
utilize small second-order heterogeneity of local objectives and suggests to
randomly pick up one of the local models instead of taking average of them when
workers are synchronized. Under small heterogeneity of local objectives, we
show that our methods achieve smaller communication complexity than both the
previous non-local and local methods for general nonconvex objectives.
Furthermore, we also compare the total execution time, that is the sum of total
communication time and total computational time per worker, and show the
superiority of our methods to the existing methods when the heterogeneity is
small and single communication time is more time consuming than single
stochastic gradient computation. Numerical results are provided to verify our
theoretical findings and give empirical evidence of the superiority of our
algorithms.
Related papers
- Stability and Generalization for Distributed SGDA [70.97400503482353]
We propose the stability-based generalization analytical framework for Distributed-SGDA.
We conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics.
Our theoretical results reveal the trade-off between the generalization gap and optimization error.
arXiv Detail & Related papers (2024-11-14T11:16:32Z) - Improved Generalization Bounds for Communication Efficient Federated Learning [4.3707341422218215]
This paper focuses on reducing the communication cost of federated learning by exploring generalization bounds and representation learning.
We design a novel Federated Learning with Adaptive Local Steps (FedALS) algorithm based on our generalization bound and representation learning analysis.
arXiv Detail & Related papers (2024-04-17T21:17:48Z) - Distributed Learning of Mixtures of Experts [0.0]
We deal with datasets that are either distributed by nature or potentially large for which distributing the computations is usually a standard way to proceed.
We propose a distributed learning approach for mixtures of experts (MoE) models with an aggregation strategy to construct a reduction estimator from local estimators fitted parallelly to distributed subsets of the data.
arXiv Detail & Related papers (2023-12-15T15:26:13Z) - Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD
for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time.
We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z) - Local Stochastic Gradient Descent Ascent: Convergence Analysis and
Communication Efficiency [15.04034188283642]
Local SGD is a promising approach to overcome the communication overhead in distributed learning.
We show that local SGDA can provably optimize distributed minimax problems in both homogeneous and heterogeneous data.
arXiv Detail & Related papers (2021-02-25T20:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.