FEDNEST: Federated Bilevel, Minimax, and Compositional Optimization
- URL: http://arxiv.org/abs/2205.02215v1
- Date: Wed, 4 May 2022 17:48:55 GMT
- Title: FEDNEST: Federated Bilevel, Minimax, and Compositional Optimization
- Authors: Davoud Ataee Tarzanagh, Mingchen Li, Christos Thrampoulidis, Samet
Oymak
- Abstract summary: Many contemporary ML problems fall under nested bilevel programming that subsumes minimax and compositional optimization.
We propose FedNest: A federated alternating gradient method to address general nested problems.
- Score: 53.78643974257301
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standard federated optimization methods successfully apply to stochastic
problems with \textit{single-level} structure. However, many contemporary ML
problems -- including adversarial robustness, hyperparameter tuning, and
actor-critic -- fall under nested bilevel programming that subsumes minimax and
compositional optimization. In this work, we propose FedNest: A federated
alternating stochastic gradient method to address general nested problems. We
establish provable convergence rates for FedNest in the presence of
heterogeneous data and introduce variations for bilevel, minimax, and
compositional optimization. FedNest introduces multiple innovations including
federated hypergradient computation and variance reduction to address
inner-level heterogeneity. We complement our theory with experiments on
hyperparameter \& hyper-representation learning and minimax optimization that
demonstrate the benefits of our method in practice. Code is available at
https://github.com/mc-nya/FedNest.
Related papers
- Byzantine-resilient Federated Learning Employing Normalized Gradients on Non-IID Datasets [23.640506243685863]
In practical federated learning (FLNGA) the presence of malicious attacks and data heterogeneity often introduces biases into the learning process.
We propose a Normalized Gradient unit (Fed-M) model which normalizes uploaded local gradients to be before aggregation, achieving a time of $mathcalO(pM)$.
arXiv Detail & Related papers (2024-08-18T16:50:39Z) - Contextual Stochastic Bilevel Optimization [50.36775806399861]
We introduce contextual bilevel optimization (CSBO) -- a bilevel optimization framework with the lower-level problem minimizing an expectation on some contextual information and the upper-level variable.
It is important for applications such as meta-learning, personalized learning, end-to-end learning, and Wasserstein distributionally robustly optimization with side information (WDRO-SI)
arXiv Detail & Related papers (2023-10-27T23:24:37Z) - Near-Optimal Decentralized Momentum Method for Nonconvex-PL Minimax
Problems [39.197569803430646]
Minimax optimization plays an important role in many machine learning tasks such as adversarial networks (GANs) and adversarial training.
Although recently a wide variety of optimization methods have been proposed to solve the minimax problems, most of them ignore the distributed setting.
arXiv Detail & Related papers (2023-04-21T11:38:41Z) - Decentralized Riemannian Algorithm for Nonconvex Minimax Problems [82.50374560598493]
The minimax algorithms for neural networks have been developed to solve many problems.
In this paper, we propose two types of minimax algorithms.
For the setting, we propose DRSGDA and prove that our method achieves a gradient.
arXiv Detail & Related papers (2023-02-08T01:42:45Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - A Constrained Optimization Approach to Bilevel Optimization with
Multiple Inner Minima [49.320758794766185]
We propose a new approach, which convert the bilevel problem to an equivalent constrained optimization, and then the primal-dual algorithm can be used to solve the problem.
Such an approach enjoys a few advantages including (a) addresses the multiple inner minima challenge; (b) fully first-order efficiency without Jacobian computations.
arXiv Detail & Related papers (2022-03-01T18:20:01Z) - Convergence Properties of Stochastic Hypergradients [38.64355126221992]
We study approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk on a large dataset.
We provide numerical experiments to support our theoretical analysis and to show the advantage of using hypergradients in practice.
arXiv Detail & Related papers (2020-11-13T20:50:36Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.