Momentum Accelerates the Convergence of Stochastic AUPRC Maximization
- URL: http://arxiv.org/abs/2107.01173v1
- Date: Fri, 2 Jul 2021 16:21:52 GMT
- Title: Momentum Accelerates the Convergence of Stochastic AUPRC Maximization
- Authors: Guanghui Wang, Ming Yang, Lijun Zhang, Tianbao Yang
- Abstract summary: We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
- Score: 80.8226518642952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study stochastic optimization of areas under
precision-recall curves (AUPRC), which is widely used for combating imbalanced
classification tasks. Although a few methods have been proposed for maximizing
AUPRC, stochastic optimization of AUPRC with convergence guarantee remains an
undeveloped territory. A recent work [42] has proposed a promising approach
towards AUPRC based on maximizing a surrogate loss for the average precision,
and proved an $O(1/\epsilon^5)$ complexity for finding an $\epsilon$-stationary
solution of the non-convex objective. In this paper, we further improve the
stochastic optimization of AURPC by (i) developing novel stochastic momentum
methods with a better iteration complexity of $O(1/\epsilon^4)$ for finding an
$\epsilon$-stationary solution; and (ii) designing a novel family of stochastic
adaptive methods with the same iteration complexity of $O(1/\epsilon^4)$, which
enjoy faster convergence in practice. To this end, we propose two innovative
techniques that are critical for improving the convergence: (i) the biased
estimators for tracking individual ranking scores are updated in a randomized
coordinate-wise manner; and (ii) a momentum update is used on top of the
stochastic gradient estimator for tracking the gradient of the objective.
Extensive experiments on various data sets demonstrate the effectiveness of the
proposed algorithms. Of independent interest, the proposed stochastic momentum
and adaptive algorithms are also applicable to a class of two-level stochastic
dependent compositional optimization problems.
Related papers
- Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions [26.543628010637036]
We introduce a novel adaptive reduction method that achieves an optimal convergence rate of $mathcalO(log T)$ for non- functions.
We also extend the proposed technique to obtain the same optimal rate of $mathcalO(log T)$ for compositional optimization.
arXiv Detail & Related papers (2024-06-04T04:39:51Z) - Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning [5.325297567945828]
We propose a new method for two-time-scale optimization that achieves significantly faster convergence than the prior arts.
We characterize the proposed algorithm under various conditions and show how it specializes on online sample-based methods.
arXiv Detail & Related papers (2024-05-15T19:03:08Z) - Enhancing Gaussian Process Surrogates for Optimization and Posterior Approximation via Random Exploration [2.984929040246293]
novel noise-free Bayesian optimization strategies that rely on a random exploration step to enhance the accuracy of Gaussian process surrogate models.
New algorithms retain the ease of implementation of the classical GP-UCB, but an additional exploration step facilitates their convergence.
arXiv Detail & Related papers (2024-01-30T14:16:06Z) - Fast Nonlinear Two-Time-Scale Stochastic Approximation: Achieving $O(1/k)$ Finite-Sample Complexity [2.5382095320488665]
This paper proposes to develop a new variant of the two-time-scale monotone approximation to find the roots of two coupled nonlinear operators.
Our key idea is to leverage the classic Ruppert-Polyak averaging technique to dynamically estimate the operators through their samples.
The estimated values of these averaging steps will then be used in the two-time-scale approximation updates to find the desired solution.
arXiv Detail & Related papers (2024-01-23T13:44:15Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.