A Multistep Frank-Wolfe Method
- URL: http://arxiv.org/abs/2210.08110v1
- Date: Fri, 14 Oct 2022 21:12:01 GMT
- Title: A Multistep Frank-Wolfe Method
- Authors: Zhaoyue Chen, Yifan Sun
- Abstract summary: We study the zig-zagging phenomenon in the Frank-Wolfe method as an artifact of discretization.
We propose multistep Frank-Wolfe variants where the truncation errors decay as $O(Deltap)$, where $p$ is the method's order.
- Score: 2.806911268410107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Frank-Wolfe algorithm has regained much interest in its use in
structurally constrained machine learning applications. However, one major
limitation of the Frank-Wolfe algorithm is the slow local convergence property
due to the zig-zagging behavior. We observe the zig-zagging phenomenon in the
Frank-Wolfe method as an artifact of discretization, and propose multistep
Frank-Wolfe variants where the truncation errors decay as $O(\Delta^p)$, where
$p$ is the method's order. This strategy "stabilizes" the method, and allows
tools like line search and momentum to have more benefits. However, our results
suggest that the worst case convergence rate of Runge-Kutta-type discretization
schemes cannot improve upon that of the vanilla Frank-Wolfe method for a rate
depending on $k$. Still, we believe that this analysis adds to the growing
knowledge of flow analysis for optimization methods, and is a cautionary tale
on the ultimate usefulness of multistep methods.
Related papers
- Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features [65.64276393443346]
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints.
We present two new variants of the algorithms for minimization of the finite-sum gradient.
arXiv Detail & Related papers (2023-04-23T20:05:09Z) - Reducing Discretization Error in the Frank-Wolfe Method [2.806911268410107]
The Frank-Wolfe algorithm is a popular method in structurally constrained machine learning applications.
One major limitation of the method is a slow rate of convergence that is difficult to accelerate due to erratic, zig-zagging step directions.
We propose two improvements: a multistep Frank-Wolfe method that directly applies optimized higher-order discretization schemes; and an LMO-averaging scheme with reduced discretization error.
arXiv Detail & Related papers (2023-04-04T00:43:05Z) - Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method
for Empirical Risk Minimization [1.4504054468850665]
In Empirical Minimization -- Minimization -- we present a novel computational step-size approach for which we have computational guarantees.
We show that our methods exhibit very significant problems on realworld binary datasets.
We also present a novel adaptive step-size approach for which we have computational guarantees.
arXiv Detail & Related papers (2022-08-30T00:08:37Z) - Accelerating Frank-Wolfe via Averaging Step Directions [2.806911268410107]
We consider a modified Frank-Wolfe where the step direction is a simple weighted average of past oracle calls.
We show that this method improves the convergence rate over several problems, especially after the sparse manifold has been detected.
arXiv Detail & Related papers (2022-05-24T05:26:25Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Scalable Frank-Wolfe on Generalized Self-concordant Functions via Simple Steps [66.88729048402082]
Generalized self-concordance is a key property present in the objective function of many learning problems.
We show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.
arXiv Detail & Related papers (2021-05-28T15:26:36Z) - RNN Training along Locally Optimal Trajectories via Frank-Wolfe
Algorithm [50.76576946099215]
We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region.
We develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation.
arXiv Detail & Related papers (2020-10-12T01:59:18Z) - Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization [30.980431848476925]
We propose an algorithm for constrained finite smooth-sum minimization with a generalized linear prediction/structure.
The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration independent of the dataset size.
We provide an implementation of all considered methods in an open-source package.
arXiv Detail & Related papers (2020-02-27T00:47:21Z) - A Newton Frank-Wolfe Method for Constrained Self-Concordant Minimization [60.90222082871258]
We demonstrate how to scalably solve a class of constrained self-concordant minimization problems using linear minimization oracles (LMO) over the constraint set.
We prove that the number of LMO calls of our method is nearly the same as that of the Frank-Wolfe method in the L-smooth case.
arXiv Detail & Related papers (2020-02-17T15:28:31Z) - Global Convergence of Frank Wolfe on One Hidden Layer Networks [121.96696298666014]
We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks.
When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program.
arXiv Detail & Related papers (2020-02-06T11:58:43Z) - Faster Projection-free Online Learning [34.96927532439896]
We give an efficient projection-free algorithm that guarantees $T2/3$ regret for general online convex optimization.
Our algorithm is derived using the Follow-the-Perturbed-Leader method and is analyzed using an online primal-dual framework.
arXiv Detail & Related papers (2020-01-30T21:18:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.