Related papers: Stochastic Compositional Minimax Optimization with Provable Convergence Guarantees

Stochastic Compositional Minimax Optimization with Provable Convergence Guarantees

URL: http://arxiv.org/abs/2408.12505v1
Date: Thu, 22 Aug 2024 16:00:31 GMT
Title: Stochastic Compositional Minimax Optimization with Provable Convergence Guarantees
Authors: Yuyang Deng, Fuli Qiao, Mehrdad Mahdavi,
Abstract summary: compositional minimax problems are in machine learning, yet there are only limited established on the convergence of this class of problems. In this paper we propose a formal definition of the minimax problem which involves optimizing a minimax loss with a compositional structure.
Score: 14.301500851291989
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stochastic compositional minimax problems are prevalent in machine learning, yet there are only limited established on the convergence of this class of problems. In this paper, we propose a formal definition of the stochastic compositional minimax problem, which involves optimizing a minimax loss with a compositional structure either in primal , dual, or both primal and dual variables. We introduce a simple yet effective algorithm, stochastically Corrected stOchastic gradient Descent Ascent (CODA), which is a descent ascent type algorithm with compositional correction steps, and establish its convergence rate in aforementioned three settings. In the presence of the compositional structure in primal, the objective function typically becomes nonconvex in primal due to function composition. Thus, we consider the nonconvex-strongly-concave and nonconvex-concave settings and show that CODA can efficiently converge to a stationary point. In the case of composition on the dual, the objective function becomes nonconcave in the dual variable, and we demonstrate convergence in the strongly-convex-nonconcave and convex-nonconcave setting. In the case of composition on both variables, the primal and dual variables may lose convexity and concavity, respectively. Therefore, we anaylze the convergence in weakly-convex-weakly-concave setting. We also give a variance reduction version algorithm, CODA+, which achieves the best known rate on nonconvex-strongly-concave and nonconvex-concave compositional minimax problem. This work initiates the theoretical study of the stochastic compositional minimax problem on various settings and may inform modern machine learning scenarios such as domain adaptation or robust model-agnostic meta-learning.

Related papers

Stochastic Momentum Methods for Non-smooth Non-Convex Finite-Sum Coupled Compositional Optimization [64.99236464953032]
We propose a new state-of-the-art complexity of $O(/epsilon)$ for finding an (nearly) $'level KKT solution.<n>By applying our hinge-of-the-art complexity of $O(/epsilon)$ for finding an (nearly) $'level KKT solution, we achieve a new state-of-the-art complexity of $O(/epsilon)$ for finding an (nearly) $'level KKT solution.
arXiv Detail & Related papers (2025-06-03T06:31:59Z)
New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition [11.530542389959347]
We present fundamental limits of first-order optimization in a range of non-dimensional settings, including L-Convexity (QC), Quadratic Growth (smoothQG), and Restricted Inequalities (RSI)
arXiv Detail & Related papers (2025-02-19T19:21:00Z)
Shuffling Gradient-Based Methods for Nonconvex-Concave Minimax Optimization [20.093236438944718]
We develop novel gradient-based methods for tackling non-linear minimax problems. We show that the new methods achieve comparable results with other methods.
arXiv Detail & Related papers (2024-10-29T17:47:22Z)
Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.<n>Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z)
Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning. This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z)
Decentralized gradient descent maximization method for composite nonconvex strongly-concave minimax problems [7.5573375809946395]
We make the first attempt on solving NCSC minimax problems that can focus on both stationary nonsmooth terms. Our algorithm is designed based on a novel reformulation of the minimax problem.
arXiv Detail & Related papers (2023-04-05T13:54:43Z)
Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization [108.35402316802765]
We propose a new first-order optimization algorithm -- AcceleratedGradient-OptimisticGradient (AG-OG) Ascent. We show that AG-OG achieves the optimal convergence rate (up to a constant) for a variety of settings. We extend our algorithm to extend the setting and achieve the optimal convergence rate in both bi-SC-SC and bi-C-SC settings.
arXiv Detail & Related papers (2022-10-31T17:59:29Z)
FEDNEST: Federated Bilevel, Minimax, and Compositional Optimization [53.78643974257301]
Many contemporary ML problems fall under nested bilevel programming that subsumes minimax and compositional optimization. We propose FedNest: A federated alternating gradient method to address general nested problems.
arXiv Detail & Related papers (2022-05-04T17:48:55Z)
Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Minimax Problems [80.46370778277186]
Large scale convex-concave minimax problems arise in numerous applications, including game theory, robust training, and training of generative adversarial networks. We develop a communication-efficient distributed extragrad algorithm, LocalAdaSient, with an adaptive learning rate suitable for solving convex-concave minimax problem in the. Server model. We demonstrate its efficacy through several experiments in both the homogeneous and heterogeneous settings.
arXiv Detail & Related papers (2021-06-18T09:42:05Z)
Stability and Generalization of Stochastic Gradient Methods for Minimax Problems [71.60601421935844]
Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs) We provide a comprehensive generalization analysis of examples from training gradient methods for minimax problems.
arXiv Detail & Related papers (2021-05-08T22:38:00Z)
Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point. The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z)
Hybrid Variance-Reduced SGD Algorithms For Nonconvex-Concave Minimax Problems [26.24895953952318]
We develop an algorithm to solve a class of non-gence minimax problems. They can also work with both single or two mini-batch derivatives.
arXiv Detail & Related papers (2020-06-27T03:05:18Z)
Inexact and Stochastic Generalized Conditional Gradient with Augmented Lagrangian and Proximal Step [2.0196229393131726]
We analyze inexact and versions of the CGALP algorithm developed in the authors' previous paper. This allows one to compute some gradients, terms, and/or linear minimization oracles in an inexact fashion. We show convergence of the Lagrangian to an optimum and feasibility of the affine constraint.
arXiv Detail & Related papers (2020-05-11T14:52:16Z)
Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems [39.13924898972611]
Non minimaxewicz problems appear frequently in emerging machine learning applications generative adversarial networks and adversarial learning. GDA algorithms with constant size can potentially diverge even in the convex setting. AGDA algorithm converges globally at a rate that attains a sub rate.
arXiv Detail & Related papers (2020-02-22T04:20:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.