Federated Optimization of Smooth Loss Functions
- URL: http://arxiv.org/abs/2201.01954v2
- Date: Thu, 4 Jan 2024 01:46:17 GMT
- Title: Federated Optimization of Smooth Loss Functions
- Authors: Ali Jadbabaie and Anuran Makur and Devavrat Shah
- Abstract summary: In this work, we study empirical risk minimization (ERM) within a federated learning framework.
We propose the Federated Low Rank Gradient Descent (FedLRGD) algorithm.
Our method solves the ERM problem at the server using inexact gradient descent.
- Score: 35.944772011421264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we study empirical risk minimization (ERM) within a federated
learning framework, where a central server minimizes an ERM objective function
using training data that is stored across $m$ clients. In this setting, the
Federated Averaging (FedAve) algorithm is the staple for determining
$\epsilon$-approximate solutions to the ERM problem. Similar to standard
optimization algorithms, the convergence analysis of FedAve only relies on
smoothness of the loss function in the optimization parameter. However, loss
functions are often very smooth in the training data too. To exploit this
additional smoothness, we propose the Federated Low Rank Gradient Descent
(FedLRGD) algorithm. Since smoothness in data induces an approximate low rank
structure on the loss function, our method first performs a few rounds of
communication between the server and clients to learn weights that the server
can use to approximate clients' gradients. Then, our method solves the ERM
problem at the server using inexact gradient descent. To show that FedLRGD can
have superior performance to FedAve, we present a notion of federated oracle
complexity as a counterpart to canonical oracle complexity. Under some
assumptions on the loss function, e.g., strong convexity in parameter,
$\eta$-H\"older smoothness in data, etc., we prove that the federated oracle
complexity of FedLRGD scales like $\phi m(p/\epsilon)^{\Theta(d/\eta)}$ and
that of FedAve scales like $\phi m(p/\epsilon)^{3/4}$ (neglecting sub-dominant
factors), where $\phi\gg 1$ is a "communication-to-computation ratio," $p$ is
the parameter dimension, and $d$ is the data dimension. Then, we show that when
$d$ is small and the loss function is sufficiently smooth in the data, FedLRGD
beats FedAve in federated oracle complexity. Finally, in the course of
analyzing FedLRGD, we also establish a result on low rank approximation of
latent variable models.
Related papers
- Byzantine-resilient Federated Learning Employing Normalized Gradients on Non-IID Datasets [23.640506243685863]
In practical federated learning (FLNGA) the presence of malicious attacks and data heterogeneity often introduces biases into the learning process.
We propose a Normalized Gradient unit (Fed-M) model which normalizes uploaded local gradients to be before aggregation, achieving a time of $mathcalO(pM)$.
arXiv Detail & Related papers (2024-08-18T16:50:39Z) - On the Performance of Empirical Risk Minimization with Smoothed Data [59.3428024282545]
Empirical Risk Minimization (ERM) is able to achieve sublinear error whenever a class is learnable with iid data.
We show that ERM is able to achieve sublinear error whenever a class is learnable with iid data.
arXiv Detail & Related papers (2024-02-22T21:55:41Z) - Solving a Class of Non-Convex Minimax Optimization in Federated Learning [84.98927714326908]
The minimax problems arise throughout machine learning applications, ranging from machine learning training to large-scale learning.
We propose a class of algorithms for non minimax problems (emphi) that reduce complexity to $varepsilon-6)$.
We prove that FedSGDA-M has the best sample complexity of $O(kappa2-3)$ and the best-known communication of $O(kappa2-3)$.
arXiv Detail & Related papers (2023-10-05T15:48:41Z) - Globally Convergent Accelerated Algorithms for Multilinear Sparse
Logistic Regression with $\ell_0$-constraints [2.323238724742687]
Multilinear logistic regression serves as a powerful tool for the analysis of multidimensional data.
We propose an Accelerated Proximal Alternating Minim-MLSR model to solve the $ell_0$-MLSR.
We also demonstrate that APALM$+$ is globally convergent to a first-order critical point as well as to establish convergence by using the Kurdy-Lojasiewicz property.
arXiv Detail & Related papers (2023-09-17T11:05:08Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Sharper Rates and Flexible Framework for Nonconvex SGD with Client and
Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions.
We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism.
We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z) - High-Dimensional Inference over Networks: Linear Convergence and
Statistical Guarantees [20.701475313495884]
We study a sparse linear regression over a network of agents, modeled as an undirected graph and no server node.
We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm.
arXiv Detail & Related papers (2022-01-21T01:26:08Z) - Gradient-Based Empirical Risk Minimization using Local Polynomial
Regression [39.29885444997579]
A major goal of this literature has been to compare different algorithms, such as gradient descent (GD) or gradient descent (SGD)
We demonstrate that when the loss function is smooth in the data, we can learn the oracle at every iteration and beat the oracle complexities of both GD and SGD.
arXiv Detail & Related papers (2020-11-04T20:10:31Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.