Related papers: Accelerating Frank-Wolfe via Averaging Step Directions

Accelerating Frank-Wolfe via Averaging Step Directions

URL: http://arxiv.org/abs/2205.11794v1
Date: Tue, 24 May 2022 05:26:25 GMT
Title: Accelerating Frank-Wolfe via Averaging Step Directions
Authors: Zhaoyue Chen, Yifan Sun
Abstract summary: We consider a modified Frank-Wolfe where the step direction is a simple weighted average of past oracle calls. We show that this method improves the convergence rate over several problems, especially after the sparse manifold has been detected.
Score: 2.806911268410107
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Frank-Wolfe method is a popular method in sparse constrained optimization, due to its fast per-iteration complexity. However, the tradeoff is that its worst case global convergence is comparatively slow, and importantly, is fundamentally slower than its flow rate--that is to say, the convergence rate is throttled by discretization error. In this work, we consider a modified Frank-Wolfe where the step direction is a simple weighted average of past oracle calls. This method requires very little memory and computational overhead, and provably decays this discretization error term. Numerically, we show that this method improves the convergence rate over several problems, especially after the sparse manifold has been detected. Theoretically, we show the method has an overall global convergence rate of $O(1/k^p)$, where $0< p < 1$; after manifold identification, this rate speeds to $O(1/k^{3p/2})$. We also observe that the method achieves this accelerated rate from a very early stage, suggesting a promising mode of acceleration for this family of methods.

Related papers

Stochastic Newton Proximal Extragradient Method [18.47705532817026]
We propose a novel Newton extragradient method that improves these bounds. We accomplish this by extending the Hybrid Proximal Extragradient (HPE) framework.
arXiv Detail & Related papers (2024-06-03T16:06:23Z)
Incremental Quasi-Newton Methods with Faster Superlinear Convergence Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian. The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate. This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z)
Reducing Discretization Error in the Frank-Wolfe Method [2.806911268410107]
The Frank-Wolfe algorithm is a popular method in structurally constrained machine learning applications. One major limitation of the method is a slow rate of convergence that is difficult to accelerate due to erratic, zig-zagging step directions. We propose two improvements: a multistep Frank-Wolfe method that directly applies optimized higher-order discretization schemes; and an LMO-averaging scheme with reduced discretization error.
arXiv Detail & Related papers (2023-04-04T00:43:05Z)
Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods [57.050204432302195]
This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions. Our algorithm achieves $O(sigma / sqrtT)$ convergence when the oracle feedback is with variance $sigma2$, and improves its convergence to $O( 1 / T3)$ with deterministic oracles.
arXiv Detail & Related papers (2022-11-03T14:12:51Z)
Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems. We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z)
A Multistep Frank-Wolfe Method [2.806911268410107]
We study the zig-zagging phenomenon in the Frank-Wolfe method as an artifact of discretization. We propose multistep Frank-Wolfe variants where the truncation errors decay as $O(Deltap)$, where $p$ is the method's order.
arXiv Detail & Related papers (2022-10-14T21:12:01Z)
Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method. We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z)
Nesterov Accelerated Shuffling Gradient Method for Convex Optimization [15.908060383231371]
We show that our algorithm has an improved rate of $mathcalO (1/T)$ using unified shuffling schemes. Our convergence analysis does not require an assumption on bounded domain or a bounded gradient condition. Numerical simulations demonstrate the efficiency of our algorithm.
arXiv Detail & Related papers (2022-02-07T21:23:17Z)
Scalable Frank-Wolfe on Generalized Self-concordant Functions via Simple Steps [66.88729048402082]
Generalized self-concordance is a key property present in the objective function of many learning problems. We show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.
arXiv Detail & Related papers (2021-05-28T15:26:36Z)
Fast Rates for the Regret of Offline Reinforcement Learning [69.23654172273085]
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinitehorizon discounted decision process (MDP) We show that given any estimate for the optimal quality function $Q*$, the regret of the policy it defines converges at a rate given by the exponentiation of the $Q*$-estimate's pointwise convergence rate.
arXiv Detail & Related papers (2021-01-31T16:17:56Z)
Accelerate the Warm-up Stage in the Lasso Computation via a Homotopic Approach [2.538209532048867]
Homotopic method is used to approximate the $ell_1$ penalty that is used in the Lasso-type of estimators. We prove a faster numerical convergence rate than any existing methods in computing for the Lasso-type of estimators.
arXiv Detail & Related papers (2020-10-26T22:37:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.