Over-the-Air Computation Aided Federated Learning with the Aggregation
of Normalized Gradient
- URL: http://arxiv.org/abs/2308.09082v2
- Date: Sun, 3 Sep 2023 03:59:53 GMT
- Title: Over-the-Air Computation Aided Federated Learning with the Aggregation
of Normalized Gradient
- Authors: Rongfei Fan, Xuming An, Shiyuan Zuo, and Han Hu
- Abstract summary: Over-the-air computation is a communication-efficient solution for federated learning (FL)
In such a system, iterative procedure of private loss function is updated, and then transmitted by every mobile device.
To circumvent this problem, we propose to turn local gradient to be normalized one before amplifying it.
- Score: 12.692064367193934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over-the-air computation is a communication-efficient solution for federated
learning (FL). In such a system, iterative procedure is performed: Local
gradient of private loss function is updated, amplified and then transmitted by
every mobile device; the server receives the aggregated gradient all-at-once,
generates and then broadcasts updated model parameters to every mobile device.
In terms of amplification factor selection, most related works suppose the
local gradient's maximal norm always happens although it actually fluctuates
over iterations, which may degrade convergence performance. To circumvent this
problem, we propose to turn local gradient to be normalized one before
amplifying it. Under our proposed method, when the loss function is smooth, we
prove our proposed method can converge to stationary point at sub-linear rate.
In case of smooth and strongly convex loss function, we prove our proposed
method can achieve minimal training loss at linear rate with any small positive
tolerance. Moreover, a tradeoff between convergence rate and the tolerance is
discovered. To speedup convergence, problems optimizing system parameters are
also formulated for above two cases. Although being non-convex, optimal
solution with polynomial complexity of the formulated problems are derived.
Experimental results show our proposed method can outperform benchmark methods
on convergence performance.
Related papers
- An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes [17.804065824245402]
In machine learning applications, each loss function is non-negative and can be expressed as the composition of a square and its real-valued square root.
We show how to apply the Gauss-Newton method or the Levssquardt method to minimize the average of smooth but possibly non-negative functions.
arXiv Detail & Related papers (2024-07-05T08:53:06Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time [45.72323731094864]
In this paper, we study the optimality gap between two-layer ReLULU networks regularized with weight decay and their convex relaxations.
Our study sheds new light on understanding why local methods work well.
arXiv Detail & Related papers (2024-02-06T01:29:35Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Iterative Pre-Conditioning for Expediting the Gradient-Descent Method:
The Distributed Linear Least-Squares Problem [0.966840768820136]
This paper considers the multi-agent linear least-squares problem in a server-agent network.
The goal for the agents is to compute a linear model that optimally fits the collective data points held by all the agents, without sharing their individual local data points.
We propose an iterative pre-conditioning technique that mitigates the deleterious effect of the conditioning of data points on the rate of convergence of the gradient-descent method.
arXiv Detail & Related papers (2020-08-06T20:01:18Z) - A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to
Balance Communication Overhead, Computational Complexity, and Convergence
Rate [28.167294398293297]
We propose a method of distributed gradient descent (SGD) with low communication load and computational complexity, and still fast.
To reduce the computational complexity in each iteration, the worker nodes approximate the directional derivatives with zeroth-order gradient estimation.
arXiv Detail & Related papers (2020-03-27T14:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.