Accelerated Stochastic Gradient-free and Projection-free Methods
- URL: http://arxiv.org/abs/2007.12625v2
- Date: Mon, 10 Aug 2020 15:45:54 GMT
- Title: Accelerated Stochastic Gradient-free and Projection-free Methods
- Authors: Feihu Huang, Lue Tao, Songcan Chen
- Abstract summary: We propose an accelerated zeroth-order Frank-Wolfe (Acc-SZOFW) based on a new reduced variance technique of STORM.
To relax the large batches required in the Acc-SZOFW, we further propose a novel accelerated zeroth-order Frank-Wolfe (Acc-SZOFW*) based on a new reduced variance technique of STORM.
- Score: 37.15461647823691
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the paper, we propose a class of accelerated stochastic gradient-free and
projection-free (a.k.a., zeroth-order Frank-Wolfe) methods to solve the
constrained stochastic and finite-sum nonconvex optimization. Specifically, we
propose an accelerated stochastic zeroth-order Frank-Wolfe (Acc-SZOFW) method
based on the variance reduced technique of SPIDER/SpiderBoost and a novel
momentum accelerated technique. Moreover, under some mild conditions, we prove
that the Acc-SZOFW has the function query complexity of
$O(d\sqrt{n}\epsilon^{-2})$ for finding an $\epsilon$-stationary point in the
finite-sum problem, which improves the exiting best result by a factor of
$O(\sqrt{n}\epsilon^{-2})$, and has the function query complexity of
$O(d\epsilon^{-3})$ in the stochastic problem, which improves the exiting best
result by a factor of $O(\epsilon^{-1})$. To relax the large batches required
in the Acc-SZOFW, we further propose a novel accelerated stochastic
zeroth-order Frank-Wolfe (Acc-SZOFW*) based on a new variance reduced technique
of STORM, which still reaches the function query complexity of
$O(d\epsilon^{-3})$ in the stochastic problem without relying on any large
batches. In particular, we present an accelerated framework of the Frank-Wolfe
methods based on the proposed momentum accelerated technique. The extensive
experimental results on black-box adversarial attack and robust black-box
classification demonstrate the efficiency of our algorithms.
Related papers
- Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features [65.64276393443346]
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints.
We present two new variants of the algorithms for minimization of the finite-sum gradient.
arXiv Detail & Related papers (2023-04-23T20:05:09Z) - Reducing Discretization Error in the Frank-Wolfe Method [2.806911268410107]
The Frank-Wolfe algorithm is a popular method in structurally constrained machine learning applications.
One major limitation of the method is a slow rate of convergence that is difficult to accelerate due to erratic, zig-zagging step directions.
We propose two improvements: a multistep Frank-Wolfe method that directly applies optimized higher-order discretization schemes; and an LMO-averaging scheme with reduced discretization error.
arXiv Detail & Related papers (2023-04-04T00:43:05Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax
Optimization [104.96004056928474]
We propose a class of faster adaptive gradient descent methods for non-strongly-concave minimax problems.
We show that our method reaches a lower sample complexity of $O(kappa2.5epsilon-3)$ with the mini-batch size $O(kappa)$.
arXiv Detail & Related papers (2021-06-30T14:47:09Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to
Minimax Optimization [133.53164856723782]
We propose a new accelerated zeroth-order momentum (AccZOM) method for black-box mini-optimization where only function values can be obtained.
Meanwhile, we propose an accelerated zeroth-order momentum descent (Acc-MDA) method for black-box minimax optimization, where only function values can be obtained.
In particular, our Acc-MDA can obtain a lower gradient complexity of $tildeO(kappa_y2.5epsilon-3)$ with a batch size $O(kappa_y4)$.
arXiv Detail & Related papers (2020-08-18T22:19:29Z) - Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query
Complexity [109.54166127479093]
Zeroth-order (a.k.a, derivative-free) methods are a class of effective optimization methods for solving machine learning problems.
In this paper, we propose a class faster faster zerothorder alternating gradient method multipliers (MMADMM) to solve the non finitesum problems.
We show that ZOMMAD methods can achieve a lower function $O(frac13nfrac1)$ for finding an $epsilon$-stationary point.
At the same time, we propose a class faster zerothorder online ADM methods (M
arXiv Detail & Related papers (2019-07-30T02:21:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.