Related papers: A Multistep Frank-Wolfe Method

A Multistep Frank-Wolfe Method

URL: http://arxiv.org/abs/2210.08110v1
Date: Fri, 14 Oct 2022 21:12:01 GMT
Title: A Multistep Frank-Wolfe Method
Authors: Zhaoyue Chen, Yifan Sun
Abstract summary: We study the zig-zagging phenomenon in the Frank-Wolfe method as an artifact of discretization. We propose multistep Frank-Wolfe variants where the truncation errors decay as $O(Deltap)$, where $p$ is the method's order.
Score: 2.806911268410107
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Frank-Wolfe algorithm has regained much interest in its use in structurally constrained machine learning applications. However, one major limitation of the Frank-Wolfe algorithm is the slow local convergence property due to the zig-zagging behavior. We observe the zig-zagging phenomenon in the Frank-Wolfe method as an artifact of discretization, and propose multistep Frank-Wolfe variants where the truncation errors decay as $O(\Delta^p)$, where $p$ is the method's order. This strategy "stabilizes" the method, and allows tools like line search and momentum to have more benefits. However, our results suggest that the worst case convergence rate of Runge-Kutta-type discretization schemes cannot improve upon that of the vanilla Frank-Wolfe method for a rate depending on $k$. Still, we believe that this analysis adds to the growing knowledge of flow analysis for optimization methods, and is a cautionary tale on the ultimate usefulness of multistep methods.

Related papers

Beyond Short Steps in Frank-Wolfe Algorithms [25.808224336342683]
We introduce novel techniques to enhance Frank-Wolfe algorithms by leveraging function smoothness beyond traditional short steps. We present a new Frank-Wolfe algorithm utilizing an optimistic framework and provide a primal-dual convergence proof.
arXiv Detail & Related papers (2025-01-30T21:52:45Z)
Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features [65.64276393443346]
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints. We present two new variants of the algorithms for minimization of the finite-sum gradient.
arXiv Detail & Related papers (2023-04-23T20:05:09Z)
Reducing Discretization Error in the Frank-Wolfe Method [2.806911268410107]
The Frank-Wolfe algorithm is a popular method in structurally constrained machine learning applications. One major limitation of the method is a slow rate of convergence that is difficult to accelerate due to erratic, zig-zagging step directions. We propose two improvements: a multistep Frank-Wolfe method that directly applies optimized higher-order discretization schemes; and an LMO-averaging scheme with reduced discretization error.
arXiv Detail & Related papers (2023-04-04T00:43:05Z)
Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method for Empirical Risk Minimization [1.4504054468850665]
In Empirical Minimization -- Minimization -- we present a novel computational step-size approach for which we have computational guarantees. We show that our methods exhibit very significant problems on realworld binary datasets. We also present a novel adaptive step-size approach for which we have computational guarantees.
arXiv Detail & Related papers (2022-08-30T00:08:37Z)
Accelerating Frank-Wolfe via Averaging Step Directions [2.806911268410107]
We consider a modified Frank-Wolfe where the step direction is a simple weighted average of past oracle calls. We show that this method improves the convergence rate over several problems, especially after the sparse manifold has been detected.
arXiv Detail & Related papers (2022-05-24T05:26:25Z)
A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states. We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z)
Scalable Frank-Wolfe on Generalized Self-concordant Functions via Simple Steps [66.88729048402082]
Generalized self-concordance is a key property present in the objective function of many learning problems. We show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.
arXiv Detail & Related papers (2021-05-28T15:26:36Z)
RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm [50.76576946099215]
We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region. We develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation.
arXiv Detail & Related papers (2020-10-12T01:59:18Z)
Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization [30.980431848476925]
We propose an algorithm for constrained finite smooth-sum minimization with a generalized linear prediction/structure. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration independent of the dataset size. We provide an implementation of all considered methods in an open-source package.
arXiv Detail & Related papers (2020-02-27T00:47:21Z)
A Newton Frank-Wolfe Method for Constrained Self-Concordant Minimization [60.90222082871258]
We demonstrate how to scalably solve a class of constrained self-concordant minimization problems using linear minimization oracles (LMO) over the constraint set. We prove that the number of LMO calls of our method is nearly the same as that of the Frank-Wolfe method in the L-smooth case.
arXiv Detail & Related papers (2020-02-17T15:28:31Z)
Global Convergence of Frank Wolfe on One Hidden Layer Networks [121.96696298666014]
We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks. When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program.
arXiv Detail & Related papers (2020-02-06T11:58:43Z)
Faster Projection-free Online Learning [34.96927532439896]
We give an efficient projection-free algorithm that guarantees $T2/3$ regret for general online convex optimization. Our algorithm is derived using the Follow-the-Perturbed-Leader method and is analyzed using an online primal-dual framework.
arXiv Detail & Related papers (2020-01-30T21:18:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.