Related papers: Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization

Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization

URL: http://arxiv.org/abs/2002.11860v6
Date: Thu, 8 Sep 2022 08:48:25 GMT
Title: Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization
Authors: Geoffrey N\'egiar, Gideon Dresdner, Alicia Tsai, Laurent El Ghaoui, Francesco Locatello, Robert M. Freund, Fabian Pedregosa
Abstract summary: We propose an algorithm for constrained finite smooth-sum minimization with a generalized linear prediction/structure. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration independent of the dataset size. We provide an implementation of all considered methods in an open-source package.
Score: 30.980431848476925
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel Stochastic Frank-Wolfe (a.k.a. conditional gradient) algorithm for constrained smooth finite-sum minimization with a generalized linear prediction/structure. This class of problems includes empirical risk minimization with sparse, low-rank, or other structured constraints. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration cost that is independent of the dataset size. Furthermore, as a byproduct of the method we obtain a stochastic estimator of the Frank-Wolfe gap that can be used as a stopping criterion. Depending on the setting, the proposed method matches or improves on the best computational guarantees for Stochastic Frank-Wolfe algorithms. Benchmarks on several datasets highlight different regimes in which the proposed method exhibits a faster empirical convergence than related methods. Finally, we provide an implementation of all considered methods in an open-source package.

Related papers

Revisiting Frank-Wolfe for Structured Nonconvex Optimization [33.44652927142219]
We introduce a new projection (Frank-Wolfe) method for optimizing structured non functions that are expressed as a difference of two convex functions. We prove that the proposed method achieves in $O-(O-)$(O-)$(O-)$(O-)$(O-)$(O-)$(O-)$(O-)$(O-)$(O-)$(O-)$(O-)$(
arXiv Detail & Related papers (2025-03-11T22:09:44Z)
Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1. This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z)
Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features [65.64276393443346]
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints. We present two new variants of the algorithms for minimization of the finite-sum gradient.
arXiv Detail & Related papers (2023-04-23T20:05:09Z)
Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points. Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z)
Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization. We provide novel excess risk bounds, both in expectation and with high probability. Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z)
A Multistep Frank-Wolfe Method [2.806911268410107]
We study the zig-zagging phenomenon in the Frank-Wolfe method as an artifact of discretization. We propose multistep Frank-Wolfe variants where the truncation errors decay as $O(Deltap)$, where $p$ is the method's order.
arXiv Detail & Related papers (2022-10-14T21:12:01Z)
Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method for Empirical Risk Minimization [1.4504054468850665]
In Empirical Minimization -- Minimization -- we present a novel computational step-size approach for which we have computational guarantees. We show that our methods exhibit very significant problems on realworld binary datasets. We also present a novel adaptive step-size approach for which we have computational guarantees.
arXiv Detail & Related papers (2022-08-30T00:08:37Z)
Fuzzy Clustering by Hyperbolic Smoothing [0.0]
We propose a novel method for building fuzzy clusters of large data sets, using a smoothing numerical approach. The smoothing allows a conversion from a strongly non-differentiable problem into differentiable subproblems of optimization without constraints of low dimension.
arXiv Detail & Related papers (2022-07-09T12:40:46Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
Conditional gradient methods for stochastically constrained convex minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems. The most important feature of our framework is that only a subset of the constraints is processed at each iteration. Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z)
Self-Concordant Analysis of Frank-Wolfe Algorithms [3.3598755777055374]
In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a self-concordant (SC) function having unbounded curvature. We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k) after k iterations.
arXiv Detail & Related papers (2020-02-11T11:30:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.