Related papers: Scalable Frank-Wolfe on Generalized Self-concordant Functions via Simple Steps

Scalable Frank-Wolfe on Generalized Self-concordant Functions via Simple Steps

URL: http://arxiv.org/abs/2105.13913v8
Date: Mon, 8 Apr 2024 07:30:41 GMT
Title: Scalable Frank-Wolfe on Generalized Self-concordant Functions via Simple Steps
Authors: Alejandro Carderera, Mathieu Besançon, Sebastian Pokutta,
Abstract summary: Generalized self-concordance is a key property present in the objective function of many learning problems. We show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.
Score: 66.88729048402082
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generalized self-concordance is a key property present in the objective function of many important learning problems. We establish the convergence rate of a simple Frank-Wolfe variant that uses the open-loop step size strategy $\gamma_t = 2/(t+2)$, obtaining a $\mathcal{O}(1/t)$ convergence rate for this class of functions in terms of primal gap and Frank-Wolfe gap, where $t$ is the iteration count. This avoids the use of second-order information or the need to estimate local smoothness parameters of previous work. We also show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.

Related papers

Rethinking the Global Convergence of Softmax Policy Gradient with Linear Function Approximation [52.772454746132276]
We show that the approximation error in modeling problem-dependent quantities is irrelevant to the algorithm's global convergence.<n>We prove that $textttLin-SPG$ with any arbitrary constant learning rate can ensure global convergence to the optimal policy.
arXiv Detail & Related papers (2025-05-06T04:03:06Z)
A Family of Controllable Momentum Coefficients for Forward-Backward Accelerated Algorithms [4.404496835736175]
Nesterov's accelerated gradient method (NAG) marks a pivotal advancement in gradient-based optimization. Its algorithmic complexity when applied to strongly convex functions remains unknown. We introduce a family of controllable momentum coefficients for forward-backward accelerated methods.
arXiv Detail & Related papers (2025-01-17T09:15:18Z)
Nonsmooth Nonparametric Regression via Fractional Laplacian Eigenmaps [15.738019181349992]
We develop nonparametric regression methods for the case when the true regression function is not necessarily smooth. More specifically, our approach is using the fractional Laplacian and is designed to handle the case when the true regression function lies in an $L$-fractional Sobolev space with order $sin (0,1)$.
arXiv Detail & Related papers (2024-02-22T21:47:29Z)
Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods [57.050204432302195]
This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions. Our algorithm achieves $O(sigma / sqrtT)$ convergence when the oracle feedback is with variance $sigma2$, and improves its convergence to $O( 1 / T3)$ with deterministic oracles.
arXiv Detail & Related papers (2022-11-03T14:12:51Z)
A Multistep Frank-Wolfe Method [2.806911268410107]
We study the zig-zagging phenomenon in the Frank-Wolfe method as an artifact of discretization. We propose multistep Frank-Wolfe variants where the truncation errors decay as $O(Deltap)$, where $p$ is the method's order.
arXiv Detail & Related papers (2022-10-14T21:12:01Z)
Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method for Empirical Risk Minimization [1.4504054468850665]
In Empirical Minimization -- Minimization -- we present a novel computational step-size approach for which we have computational guarantees. We show that our methods exhibit very significant problems on realworld binary datasets. We also present a novel adaptive step-size approach for which we have computational guarantees.
arXiv Detail & Related papers (2022-08-30T00:08:37Z)
Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization [7.197233473373693]
We consider the problem of minimizing the sum of two convex functions. One has Lipschitz-continuous gradients, and can be accessed via oracles, whereas the other is "simple" We show that one can achieve an $epsilon$ primaldual gap (in expectation) in $tildeO (1/ sqrtepsilon)$ iterations.
arXiv Detail & Related papers (2022-05-25T13:01:09Z)
A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth. We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z)
A Newton Frank-Wolfe Method for Constrained Self-Concordant Minimization [60.90222082871258]
We demonstrate how to scalably solve a class of constrained self-concordant minimization problems using linear minimization oracles (LMO) over the constraint set. We prove that the number of LMO calls of our method is nearly the same as that of the Frank-Wolfe method in the L-smooth case.
arXiv Detail & Related papers (2020-02-17T15:28:31Z)
Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions. In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
Global Convergence of Frank Wolfe on One Hidden Layer Networks [121.96696298666014]
We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks. When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program.
arXiv Detail & Related papers (2020-02-06T11:58:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.