Heavy Ball Momentum for Conditional Gradient
- URL: http://arxiv.org/abs/2110.04243v1
- Date: Fri, 8 Oct 2021 16:50:00 GMT
- Title: Heavy Ball Momentum for Conditional Gradient
- Authors: Bingcong Li, Alireza Sadeghi, Georgios B. Giannakis
- Abstract summary: Conditional gradient, aka Frank Wolfe (FW) algorithms have well-documented merits in machine learning and signal processing applications.
This work deals with heavy ball momentum, and its impact to FW.
Numerical results demonstrate the usefulness of heavy ball momentum in FW.
- Score: 67.58002120548566
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Conditional gradient, aka Frank Wolfe (FW) algorithms, have well-documented
merits in machine learning and signal processing applications. Unlike
projection-based methods, momentum cannot improve the convergence rate of FW,
in general. This limitation motivates the present work, which deals with heavy
ball momentum, and its impact to FW. Specifically, it is established that heavy
ball offers a unifying perspective on the primal-dual (PD) convergence, and
enjoys a tighter per iteration PD error rate, for multiple choices of step
sizes, where PD error can serve as the stopping criterion in practice. In
addition, it is asserted that restart, a scheme typically employed jointly with
Nesterov's momentum, can further tighten this PD error bound. Numerical results
demonstrate the usefulness of heavy ball momentum in FW iterations.
Related papers
- Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise [16.12834917344859]
It is widely conjectured that heavy-ball momentum method can provide accelerated convergence and should work well in large batch settings.
We show that heavy-ball momentum can provide $tildemathcalO(sqrtkappa)$ accelerated convergence of the bias term of SGD while still achieving near-optimal convergence rate.
This means SGD with heavy-ball momentum is useful in the large-batch settings such as distributed machine learning or federated learning.
arXiv Detail & Related papers (2023-12-22T09:58:39Z) - Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features [65.64276393443346]
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints.
We present two new variants of the algorithms for minimization of the finite-sum gradient.
arXiv Detail & Related papers (2023-04-23T20:05:09Z) - Weighted Gaussian Process Bandits for Non-stationary Environments [30.49357960656324]
We develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression.
A key challenge is how to cope with infinite-dimensional feature maps.
We provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains.
arXiv Detail & Related papers (2021-07-06T03:37:33Z) - Correcting Momentum in Temporal Difference Learning [95.62766731469671]
We argue that momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale.
We show that this phenomenon exists, and then propose a first-order correction term to momentum.
An important insight of this work is that deep RL methods are not always best served by directly importing techniques from the supervised setting.
arXiv Detail & Related papers (2021-06-07T20:41:15Z) - Escaping Saddle Points Faster with Stochastic Momentum [9.485782209646445]
In deep networks, momentum appears to significantly improve convergence time.
We show that momentum improves deep training because it modifies SGD to escape points faster.
We also show how to choose the ideal momentum parameter.
arXiv Detail & Related papers (2021-06-05T23:34:02Z) - Enhancing Parameter-Free Frank Wolfe with an Extra Subproblem [75.83472151564185]
This work introduces and analyzes a variant of the Frank Wolfe (FW) algorithm termed ExtraFW.
ExtraFW is the pair of gradients leveraged per iteration, thanks to which the decision variable is updated in a prediction-correction (PC) format.
Compared with other parameter-free FW variants that have faster rates on the same problems, ExtraFW has improved rates and fine-grained analysis thanks to its PC update.
arXiv Detail & Related papers (2020-12-09T19:47:24Z) - Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial
Attacks [86.88061841975482]
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle.
We use this setting to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method(FGSM)
We show that the method uses fewer queries and achieves higher attack success rates than the current state of the art.
arXiv Detail & Related papers (2020-10-08T18:36:51Z) - How Does Momentum Help Frank Wolfe? [81.95857252228537]
We unveil the connections between Frank Wolfe (FW) type algorithms and the momentum in Accelerated Gradient Methods (AGM)
On the negative side, these connections illustrate why momentum is unlikely to be effective for FW type algorithms.
The encouraging message behind this link, on the other hand, is that momentum is useful for FW on a class of problems.
arXiv Detail & Related papers (2020-06-19T13:06:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.