Decentralized Riemannian Algorithm for Nonconvex Minimax Problems
- URL: http://arxiv.org/abs/2302.03825v2
- Date: Thu, 9 Feb 2023 03:08:23 GMT
- Title: Decentralized Riemannian Algorithm for Nonconvex Minimax Problems
- Authors: Xidong Wu, Zhengmian Hu and Heng Huang
- Abstract summary: The minimax algorithms for neural networks have been developed to solve many problems.
In this paper, we propose two types of minimax algorithms.
For the setting, we propose DRSGDA and prove that our method achieves a gradient.
- Score: 82.50374560598493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The minimax optimization over Riemannian manifolds (possibly nonconvex
constraints) has been actively applied to solve many problems, such as robust
dimensionality reduction and deep neural networks with orthogonal weights
(Stiefel manifold). Although many optimization algorithms for minimax problems
have been developed in the Euclidean setting, it is difficult to convert them
into Riemannian cases, and algorithms for nonconvex minimax problems with
nonconvex constraints are even rare. On the other hand, to address the big data
challenges, decentralized (serverless) training techniques have recently been
emerging since they can reduce communications overhead and avoid the bottleneck
problem on the server node. Nonetheless, the algorithm for decentralized
Riemannian minimax problems has not been studied. In this paper, we study the
distributed nonconvex-strongly-concave minimax optimization problem over the
Stiefel manifold and propose both deterministic and stochastic minimax methods.
The local model is non-convex strong-concave and the Steifel manifold is a
non-convex set. The global function is represented as the finite sum of local
functions. For the deterministic setting, we propose DRGDA and prove that our
deterministic method achieves a gradient complexity of $O( \epsilon^{-2})$
under mild conditions. For the stochastic setting, we propose DRSGDA and prove
that our stochastic method achieves a gradient complexity of
$O(\epsilon^{-4})$. The DRGDA and DRSGDA are the first algorithms for
distributed minimax optimization with nonconvex constraints with exact
convergence. Extensive experimental results on the Deep Neural Networks (DNNs)
training over the Stiefel manifold demonstrate the efficiency of our
algorithms.
Related papers
- Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.
Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z) - Solving a Class of Non-Convex Minimax Optimization in Federated Learning [84.98927714326908]
The minimax problems arise throughout machine learning applications, ranging from machine learning training to large-scale learning.
We propose a class of algorithms for non minimax problems (emphi) that reduce complexity to $varepsilon-6)$.
We prove that FedSGDA-M has the best sample complexity of $O(kappa2-3)$ and the best-known communication of $O(kappa2-3)$.
arXiv Detail & Related papers (2023-10-05T15:48:41Z) - Can Decentralized Stochastic Minimax Optimization Algorithms Converge
Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning.
This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z) - Near-Optimal Decentralized Momentum Method for Nonconvex-PL Minimax
Problems [39.197569803430646]
Minimax optimization plays an important role in many machine learning tasks such as adversarial networks (GANs) and adversarial training.
Although recently a wide variety of optimization methods have been proposed to solve the minimax problems, most of them ignore the distributed setting.
arXiv Detail & Related papers (2023-04-21T11:38:41Z) - Decentralized Weakly Convex Optimization Over the Stiefel Manifold [28.427697270742947]
We focus on the Stiefel manifold in the decentralized setting, where a connected network of agents in $nMn log-1)$ are tested.
We propose an method called the decentralized subgradient method (DRSM)$ for forcing a natural station below $nMn log-1)$.
arXiv Detail & Related papers (2023-03-31T02:56:23Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - On Riemannian Gradient-Based Methods for Minimax Problems [24.199289678553896]
We propose a class of Riemanian-based methods to solve minimax problems.
We prove that our RGDA has a sample complexity of $tildeO(kappa4eps-3)$.
We also show that our Acc-RSGDA achieves a lower sample complexity on $tildeO(kappa4eps-3)$.
arXiv Detail & Related papers (2020-10-13T00:54:00Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.