Enhancing Parameter-Free Frank Wolfe with an Extra Subproblem
- URL: http://arxiv.org/abs/2012.05284v1
- Date: Wed, 9 Dec 2020 19:47:24 GMT
- Title: Enhancing Parameter-Free Frank Wolfe with an Extra Subproblem
- Authors: Bingcong Li, Lingda Wang, Georgios B. Giannakis, Zhizhen Zhao
- Abstract summary: This work introduces and analyzes a variant of the Frank Wolfe (FW) algorithm termed ExtraFW.
ExtraFW is the pair of gradients leveraged per iteration, thanks to which the decision variable is updated in a prediction-correction (PC) format.
Compared with other parameter-free FW variants that have faster rates on the same problems, ExtraFW has improved rates and fine-grained analysis thanks to its PC update.
- Score: 75.83472151564185
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Aiming at convex optimization under structural constraints, this work
introduces and analyzes a variant of the Frank Wolfe (FW) algorithm termed
ExtraFW. The distinct feature of ExtraFW is the pair of gradients leveraged per
iteration, thanks to which the decision variable is updated in a
prediction-correction (PC) format. Relying on no problem dependent parameters
in the step sizes, the convergence rate of ExtraFW for general convex problems
is shown to be ${\cal O}(\frac{1}{k})$, which is optimal in the sense of
matching the lower bound on the number of solved FW subproblems. However, the
merit of ExtraFW is its faster rate ${\cal O}\big(\frac{1}{k^2} \big)$ on a
class of machine learning problems. Compared with other parameter-free FW
variants that have faster rates on the same problems, ExtraFW has improved
rates and fine-grained analysis thanks to its PC update. Numerical tests on
binary classification with different sparsity-promoting constraints demonstrate
that the empirical performance of ExtraFW is significantly better than FW, and
even faster than Nesterov's accelerated gradient on certain datasets. For
matrix completion, ExtraFW enjoys smaller optimality gap, and lower rank than
FW.
Related papers
- Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features [65.64276393443346]
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints.
We present two new variants of the algorithms for minimization of the finite-sum gradient.
arXiv Detail & Related papers (2023-04-23T20:05:09Z) - Improved Dynamic Regret for Online Frank-Wolfe [54.690867216880356]
We investigate the dynamic regret of online Frank-Wolfe (OFW), which is an efficient projection-free algorithm for online convex optimization.
In this paper, we derive improved dynamic regret bounds for OFW by extending the fast convergence rates of FW from offline optimization to online optimization.
arXiv Detail & Related papers (2023-02-11T07:19:51Z) - Frank-Wolfe-based Algorithms for Approximating Tyler's M-estimator [19.24470467199451]
One variant uses standard Frank-Wolfe steps, the second also considers textitaway-steps (AFW)
The third is a textitgeodesic version of AFW (GAFW)
arXiv Detail & Related papers (2022-06-19T10:24:12Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Heavy Ball Momentum for Conditional Gradient [67.58002120548566]
Conditional gradient, aka Frank Wolfe (FW) algorithms have well-documented merits in machine learning and signal processing applications.
This work deals with heavy ball momentum, and its impact to FW.
Numerical results demonstrate the usefulness of heavy ball momentum in FW.
arXiv Detail & Related papers (2021-10-08T16:50:00Z) - Scalable Projection-Free Optimization [7.170441928038049]
As a projection-free algorithm, Frank-Wolfe (FW) has recently received considerable attention in the machine learning community.
We first propose a scalable projection-free optimization method that requires only one sample per iteration.
We then develop a distributed Frank-Wolfe (FW) framework for both convex and non objective functions.
arXiv Detail & Related papers (2021-05-07T22:27:18Z) - $k$FW: A Frank-Wolfe style algorithm with stronger subproblem oracles [34.53447188447357]
This paper proposes a new variant of Frank-Wolfe (FW) called $k$FW.
Standard FW suffers from slow convergence: iterates often zig-zag as update directions oscillate around extreme points of a constraint set.
The new variant, $k$FW, overcomes this problem by using two stronger subproblem oracles in each iteration.
arXiv Detail & Related papers (2020-06-29T16:02:48Z) - How Does Momentum Help Frank Wolfe? [81.95857252228537]
We unveil the connections between Frank Wolfe (FW) type algorithms and the momentum in Accelerated Gradient Methods (AGM)
On the negative side, these connections illustrate why momentum is unlikely to be effective for FW type algorithms.
The encouraging message behind this link, on the other hand, is that momentum is useful for FW on a class of problems.
arXiv Detail & Related papers (2020-06-19T13:06:00Z) - Self-Concordant Analysis of Frank-Wolfe Algorithms [3.3598755777055374]
In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a self-concordant (SC) function having unbounded curvature.
We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k) after k iterations.
arXiv Detail & Related papers (2020-02-11T11:30:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.