Faster Stochastic Variance Reduction Methods for Compositional MiniMax
Optimization
- URL: http://arxiv.org/abs/2308.09604v2
- Date: Tue, 12 Dec 2023 05:28:51 GMT
- Title: Faster Stochastic Variance Reduction Methods for Compositional MiniMax
Optimization
- Authors: Jin Liu, Xiaokang Pan, Junwen Duan, Hongdong Li, Youqi Li, Zhe Qu
- Abstract summary: compositional minimax optimization is a pivotal challenge across various machine learning domains.
Current methods of compositional minimax optimization are plagued by sub-optimal complexities or heavy reliance on sizable batch sizes.
This paper introduces a novel method, called Nested STOchastic Recursive Momentum (NSTORM), which can achieve the optimal sample complexity of $O(kappa3 /epsilon3 )$.
- Score: 50.10952609321302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper delves into the realm of stochastic optimization for compositional
minimax optimization - a pivotal challenge across various machine learning
domains, including deep AUC and reinforcement learning policy evaluation.
Despite its significance, the problem of compositional minimax optimization is
still under-explored. Adding to the complexity, current methods of
compositional minimax optimization are plagued by sub-optimal complexities or
heavy reliance on sizable batch sizes. To respond to these constraints, this
paper introduces a novel method, called Nested STOchastic Recursive Momentum
(NSTORM), which can achieve the optimal sample complexity of $O(\kappa^3
/\epsilon^3 )$ to obtain the $\epsilon$-accuracy solution. We also demonstrate
that NSTORM can achieve the same sample complexity under the Polyak-\L
ojasiewicz (PL)-condition - an insightful extension of its capabilities. Yet,
NSTORM encounters an issue with its requirement for low learning rates,
potentially constraining its real-world applicability in machine learning. To
overcome this hurdle, we present ADAptive NSTORM (ADA-NSTORM) with adaptive
learning rates. We demonstrate that ADA-NSTORM can achieve the same sample
complexity but the experimental results show its more effectiveness. All the
proposed complexities indicate that our proposed methods can match lower bounds
to existing minimax optimizations, without requiring a large batch size in each
iteration. Extensive experiments support the efficiency of our proposed
methods.
Related papers
- Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA)
Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z) - SPABA: A Single-Loop and Probabilistic Stochastic Bilevel Algorithm Achieving Optimal Sample Complexity [5.046146134689904]
We show that there is no gap in complexity analysis between bilevel and single-level optimization when implementing SPABA.
We propose several other bi-loop or nested bi-level algorithms to improve the state of complexity analysis.
arXiv Detail & Related papers (2024-05-29T05:36:03Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Debiasing Conditional Stochastic Optimization [15.901623717313493]
We study the conditional causal optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, etc.
We develop new algorithms for the finite variant variant CSO problem that significantly improve upon existing results.
We believe that our technique has the potential to be a useful tool for addressing similar challenges in other optimization problems.
arXiv Detail & Related papers (2023-04-20T19:19:55Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Minimization of Stochastic First-order Oracle Complexity of Adaptive
Methods for Nonconvex Optimization [0.0]
We prove that there exist critical batch sizes in the sense of minimizing the lower and upper bounds of the first-order oracle (SFO) complexity.
We also discuss the conditions needed for the SFO complexity to fit the lower and upper bounds and provide numerical results that support our theoretical results.
arXiv Detail & Related papers (2021-12-14T04:55:04Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components.
We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.