Reducing Discretization Error in the Frank-Wolfe Method
- URL: http://arxiv.org/abs/2304.01432v2
- Date: Thu, 13 Apr 2023 14:15:05 GMT
- Title: Reducing Discretization Error in the Frank-Wolfe Method
- Authors: Zhaoyue Chen, Yifan Sun
- Abstract summary: The Frank-Wolfe algorithm is a popular method in structurally constrained machine learning applications.
One major limitation of the method is a slow rate of convergence that is difficult to accelerate due to erratic, zig-zagging step directions.
We propose two improvements: a multistep Frank-Wolfe method that directly applies optimized higher-order discretization schemes; and an LMO-averaging scheme with reduced discretization error.
- Score: 2.806911268410107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Frank-Wolfe algorithm is a popular method in structurally constrained
machine learning applications, due to its fast per-iteration complexity.
However, one major limitation of the method is a slow rate of convergence that
is difficult to accelerate due to erratic, zig-zagging step directions, even
asymptotically close to the solution. We view this as an artifact of
discretization; that is to say, the Frank-Wolfe \emph{flow}, which is its
trajectory at asymptotically small step sizes, does not zig-zag, and reducing
discretization error will go hand-in-hand in producing a more stabilized
method, with better convergence properties. We propose two improvements: a
multistep Frank-Wolfe method that directly applies optimized higher-order
discretization schemes; and an LMO-averaging scheme with reduced discretization
error, and whose local convergence rate over general convex sets accelerates
from a rate of $O(1/k)$ to up to $O(1/k^{3/2})$.
Related papers
- Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features [65.64276393443346]
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints.
We present two new variants of the algorithms for minimization of the finite-sum gradient.
arXiv Detail & Related papers (2023-04-23T20:05:09Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - A Multistep Frank-Wolfe Method [2.806911268410107]
We study the zig-zagging phenomenon in the Frank-Wolfe method as an artifact of discretization.
We propose multistep Frank-Wolfe variants where the truncation errors decay as $O(Deltap)$, where $p$ is the method's order.
arXiv Detail & Related papers (2022-10-14T21:12:01Z) - Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method
for Empirical Risk Minimization [1.4504054468850665]
In Empirical Minimization -- Minimization -- we present a novel computational step-size approach for which we have computational guarantees.
We show that our methods exhibit very significant problems on realworld binary datasets.
We also present a novel adaptive step-size approach for which we have computational guarantees.
arXiv Detail & Related papers (2022-08-30T00:08:37Z) - Accelerating Frank-Wolfe via Averaging Step Directions [2.806911268410107]
We consider a modified Frank-Wolfe where the step direction is a simple weighted average of past oracle calls.
We show that this method improves the convergence rate over several problems, especially after the sparse manifold has been detected.
arXiv Detail & Related papers (2022-05-24T05:26:25Z) - Nesterov Accelerated Shuffling Gradient Method for Convex Optimization [15.908060383231371]
We show that our algorithm has an improved rate of $mathcalO (1/T)$ using unified shuffling schemes.
Our convergence analysis does not require an assumption on bounded domain or a bounded gradient condition.
Numerical simulations demonstrate the efficiency of our algorithm.
arXiv Detail & Related papers (2022-02-07T21:23:17Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Boosting Frank-Wolfe by Chasing Gradients [26.042029798821375]
We propose to speed up the Frank-Wolfe algorithm by better aligning the descent direction with that of the negative gradient via a subroutine.
We demonstrate its competitive advantage both per iteration and in CPU time over the state-of-the-art in a series of computational experiments.
arXiv Detail & Related papers (2020-03-13T16:29:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.