Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization
- URL: http://arxiv.org/abs/2202.13212v1
- Date: Sat, 26 Feb 2022 19:10:48 GMT
- Title: Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization
- Authors: Gideon Dresdner, Maria-Luiza Vladarean, Gunnar R\"atsch, Francesco
Locatello, Volkan Cevher, Alp Yurtsever
- Abstract summary: We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
- Score: 61.26619639722804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a stochastic conditional gradient method (CGM) for minimizing
convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
Existing CGM variants for this template either suffer from slow convergence
rates, or require carefully increasing the batch size over the course of the
algorithm's execution, which leads to computing full gradients. In contrast,
the proposed method, equipped with a stochastic average gradient (SAG)
estimator, requires only one sample per iteration. Nevertheless, it guarantees
fast convergence rates on par with more sophisticated variance reduction
techniques. In applications we put special emphasis on problems with a large
number of separable constraints. Such problems are prevalent among semidefinite
programming (SDP) formulations arising in machine learning and theoretical
computer science. We provide numerical experiments on matrix completion,
unsupervised clustering, and sparsest-cut SDPs.
Related papers
- Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Smoothing ADMM for Sparse-Penalized Quantile Regression with Non-Convex
Penalties [8.294148737585543]
This paper investigates concave and clipped quantile regression in the presence of nonsecondary absolute and non-smooth convergence penalties.
We introduce a novel-loop ADM algorithm with an increasing penalty multiplier, named SIAD, specifically for sparse regression.
arXiv Detail & Related papers (2023-09-04T21:48:51Z) - Aiming towards the minimizers: fast convergence of SGD for
overparametrized problems [25.077446336619378]
We propose a regularity regime which endows the gradient method with the same worst-case complexity as the gradient method.
All existing guarantees require the gradient method to take small steps, thereby resulting in a much slower linear rate of convergence.
We demonstrate that our condition holds when training sufficiently wide feedforward neural networks with a linear output layer.
arXiv Detail & Related papers (2023-06-05T05:21:01Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Conditional gradient methods for stochastically constrained convex
minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems.
The most important feature of our framework is that only a subset of the constraints is processed at each iteration.
Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Unified Analysis of Stochastic Gradient Methods for Composite Convex and
Smooth Optimization [15.82816385434718]
We present a unified theorem for the convergence analysis of gradient algorithms for minimizing a smooth and convex loss plus a convex regularizer.
We do this by extending the unified analysis of Gorbunov, Hanzely & Richt'arik ( 2020) and dropping the requirement that the loss function be strongly convex.
Our unified analysis applies to a host of existing algorithms such as proximal SGD, variance reduced methods, quantization and some coordinate descent type methods.
arXiv Detail & Related papers (2020-06-20T13:40:27Z) - Variance reduction for Random Coordinate Descent-Langevin Monte Carlo [7.464874233755718]
Langevin Monte Carlo (LMC) that provides fast convergence requires computation of gradient approximations.
In practice one uses finite-differencing approximations as surrogates, and the method is expensive in high-dimensions.
We introduce a new variance reduction approach, termed Coordinates Averaging Descent (RCAD), and incorporate it with both overdamped and underdamped LMC.
arXiv Detail & Related papers (2020-06-10T21:08:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.