Boosting Frank-Wolfe by Chasing Gradients
- URL: http://arxiv.org/abs/2003.06369v2
- Date: Tue, 23 Jun 2020 19:24:19 GMT
- Title: Boosting Frank-Wolfe by Chasing Gradients
- Authors: Cyrille W. Combettes and Sebastian Pokutta
- Abstract summary: We propose to speed up the Frank-Wolfe algorithm by better aligning the descent direction with that of the negative gradient via a subroutine.
We demonstrate its competitive advantage both per iteration and in CPU time over the state-of-the-art in a series of computational experiments.
- Score: 26.042029798821375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Frank-Wolfe algorithm has become a popular first-order optimization
algorithm for it is simple and projection-free, and it has been successfully
applied to a variety of real-world problems. Its main drawback however lies in
its convergence rate, which can be excessively slow due to naive descent
directions. We propose to speed up the Frank-Wolfe algorithm by better aligning
the descent direction with that of the negative gradient via a subroutine. This
subroutine chases the negative gradient direction in a matching pursuit-style
while still preserving the projection-free property. Although the approach is
reasonably natural, it produces very significant results. We derive convergence
rates $\mathcal{O}(1/t)$ to $\mathcal{O}(e^{-\omega t})$ of our method and we
demonstrate its competitive advantage both per iteration and in CPU time over
the state-of-the-art in a series of computational experiments.
Related papers
- Gradient Methods with Online Scaling [19.218484733179356]
We introduce a framework to accelerate the convergence of gradient-based methods with online learning.
We show for the first time that the widely-used hypergradient descent improves on the convergence of gradient descent.
arXiv Detail & Related papers (2024-11-04T05:04:18Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Reducing Discretization Error in the Frank-Wolfe Method [2.806911268410107]
The Frank-Wolfe algorithm is a popular method in structurally constrained machine learning applications.
One major limitation of the method is a slow rate of convergence that is difficult to accelerate due to erratic, zig-zagging step directions.
We propose two improvements: a multistep Frank-Wolfe method that directly applies optimized higher-order discretization schemes; and an LMO-averaging scheme with reduced discretization error.
arXiv Detail & Related papers (2023-04-04T00:43:05Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Frank Wolfe Meets Metric Entropy [0.0]
We provide a technique for establishing domain specific and easy-to-estimate lower bounds for Frank-Wolfe.
We show that a dimension-free linear upper bound must fail not only in the worst case, but in the emph average case.
We also establish this phenomenon for the nuclear norm ball.
arXiv Detail & Related papers (2022-05-17T21:23:36Z) - Nesterov Accelerated Shuffling Gradient Method for Convex Optimization [15.908060383231371]
We show that our algorithm has an improved rate of $mathcalO (1/T)$ using unified shuffling schemes.
Our convergence analysis does not require an assumption on bounded domain or a bounded gradient condition.
Numerical simulations demonstrate the efficiency of our algorithm.
arXiv Detail & Related papers (2022-02-07T21:23:17Z) - An Accelerated Variance-Reduced Conditional Gradient Sliding Algorithm
for First-order and Zeroth-order Optimization [111.24899593052851]
Conditional gradient algorithm (also known as the Frank-Wolfe algorithm) has recently regained popularity in the machine learning community.
ARCS is the first zeroth-order conditional gradient sliding type algorithms solving convex problems in zeroth-order optimization.
In first-order optimization, the convergence results of ARCS substantially outperform previous algorithms in terms of the number of gradient query oracle.
arXiv Detail & Related papers (2021-09-18T07:08:11Z) - Reusing Combinatorial Structure: Faster Iterative Projections over
Submodular Base Polytopes [7.734726150561089]
We develop a toolkit to speed up the computation of projections using both discrete and continuous perspectives.
For the special case of cardinality based submodular polytopes, we improve the runtime of computing certain Bregman projections by a factor of $Omega(n/log(n))$.
arXiv Detail & Related papers (2021-06-22T17:29:24Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.