Related papers: Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization

Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization

URL: http://arxiv.org/abs/2202.13212v1
Date: Sat, 26 Feb 2022 19:10:48 GMT
Title: Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization
Authors: Gideon Dresdner, Maria-Luiza Vladarean, Gunnar R\"atsch, Francesco Locatello, Volkan Cevher, Alp Yurtsever
Abstract summary: We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
Score: 61.26619639722804
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a stochastic conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. Existing CGM variants for this template either suffer from slow convergence rates, or require carefully increasing the batch size over the course of the algorithm's execution, which leads to computing full gradients. In contrast, the proposed method, equipped with a stochastic average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques. In applications we put special emphasis on problems with a large number of separable constraints. Such problems are prevalent among semidefinite programming (SDP) formulations arising in machine learning and theoretical computer science. We provide numerical experiments on matrix completion, unsupervised clustering, and sparsest-cut SDPs.

Related papers

Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Smoothing ADMM for Sparse-Penalized Quantile Regression with Non-Convex Penalties [8.294148737585543]
This paper investigates concave and clipped quantile regression in the presence of nonsecondary absolute and non-smooth convergence penalties. We introduce a novel-loop ADM algorithm with an increasing penalty multiplier, named SIAD, specifically for sparse regression.
arXiv Detail & Related papers (2023-09-04T21:48:51Z)
Aiming towards the minimizers: fast convergence of SGD for overparametrized problems [25.077446336619378]
We propose a regularity regime which endows the gradient method with the same worst-case complexity as the gradient method. All existing guarantees require the gradient method to take small steps, thereby resulting in a much slower linear rate of convergence. We demonstrate that our condition holds when training sufficiently wide feedforward neural networks with a linear output layer.
arXiv Detail & Related papers (2023-06-05T05:21:01Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Conditional gradient methods for stochastically constrained convex minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems. The most important feature of our framework is that only a subset of the constraints is processed at each iteration. Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization [15.82816385434718]
We present a unified theorem for the convergence analysis of gradient algorithms for minimizing a smooth and convex loss plus a convex regularizer. We do this by extending the unified analysis of Gorbunov, Hanzely & Richt'arik ( 2020) and dropping the requirement that the loss function be strongly convex. Our unified analysis applies to a host of existing algorithms such as proximal SGD, variance reduced methods, quantization and some coordinate descent type methods.
arXiv Detail & Related papers (2020-06-20T13:40:27Z)
Variance reduction for Random Coordinate Descent-Langevin Monte Carlo [7.464874233755718]
Langevin Monte Carlo (LMC) that provides fast convergence requires computation of gradient approximations. In practice one uses finite-differencing approximations as surrogates, and the method is expensive in high-dimensions. We introduce a new variance reduction approach, termed Coordinates Averaging Descent (RCAD), and incorporate it with both overdamped and underdamped LMC.
arXiv Detail & Related papers (2020-06-10T21:08:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.