EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections
for Federated Learning with Heterogeneous Data
- URL: http://arxiv.org/abs/2302.07155v1
- Date: Tue, 14 Feb 2023 16:05:51 GMT
- Title: EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections
for Federated Learning with Heterogeneous Data
- Authors: Michael Crawshaw, Yajie Bao, Mingrui Liu
- Abstract summary: Gradient clipping is an important technique for deep neural networks with exploding gradients, such as recurrent neural networks.
Recent datasets have shown that the loss functions do not satisfy the conventional smoothness condition, but satisfy a relaxed linear condition, i.e. relaxed smoothness.
EPISODE resamples from each client and obtains the global gradient, is used to determine whether to apply gradient clipping for the entire client.
- Score: 9.379890125442333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient clipping is an important technique for deep neural networks with
exploding gradients, such as recurrent neural networks. Recent studies have
shown that the loss functions of these networks do not satisfy the conventional
smoothness condition, but instead satisfy a relaxed smoothness condition, i.e.,
the Lipschitz constant of the gradient scales linearly in terms of the gradient
norm. Due to this observation, several gradient clipping algorithms have been
developed for nonconvex and relaxed-smooth functions. However, the existing
algorithms only apply to the single-machine or multiple-machine setting with
homogeneous data across machines. It remains unclear how to design provably
efficient gradient clipping algorithms in the general Federated Learning (FL)
setting with heterogeneous data and limited communication rounds. In this
paper, we design EPISODE, the very first algorithm to solve FL problems with
heterogeneous data in the nonconvex and relaxed smoothness setting. The key
ingredients of the algorithm are two new techniques called \textit{episodic
gradient clipping} and \textit{periodic resampled corrections}. At the
beginning of each round, EPISODE resamples stochastic gradients from each
client and obtains the global averaged gradient, which is used to (1) determine
whether to apply gradient clipping for the entire round and (2) construct local
gradient corrections for each client. Notably, our algorithm and analysis
provide a unified framework for both homogeneous and heterogeneous data under
any noise level of the stochastic gradient, and it achieves state-of-the-art
complexity results. In particular, we prove that EPISODE can achieve linear
speedup in the number of machines, and it requires significantly fewer
communication rounds. Experiments on several heterogeneous datasets show the
superior performance of EPISODE over several strong baselines in FL.
Related papers
- Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU
Networks on Nearly-orthogonal Data [66.1211659120882]
The implicit bias towards solutions with favorable properties is believed to be a key reason why neural networks trained by gradient-based optimization can generalize well.
While the implicit bias of gradient flow has been widely studied for homogeneous neural networks (including ReLU and leaky ReLU networks), the implicit bias of gradient descent is currently only understood for smooth neural networks.
arXiv Detail & Related papers (2023-10-29T08:47:48Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Convergence of First-Order Methods for Constrained Nonconvex
Optimization with Dependent Data [7.513100214864646]
We show the worst-case complexity of convergence $tildeO(t-1/4)$ and MoreautildeO(vareps-4)$ for smooth non- optimization.
We obtain first online nonnegative matrix factorization algorithms for dependent data based on projected gradient methods with adaptive step sizes and optimal convergence.
arXiv Detail & Related papers (2022-03-29T17:59:10Z) - Error-Correcting Neural Networks for Two-Dimensional Curvature
Computation in the Level-Set Method [0.0]
We present an error-neural-modeling-based strategy for approximating two-dimensional curvature in the level-set method.
Our main contribution is a redesigned hybrid solver that relies on numerical schemes to enable machine-learning operations on demand.
arXiv Detail & Related papers (2022-01-22T05:14:40Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.