Related papers: Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees

Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees

URL: http://arxiv.org/abs/2409.09906v3
Date: Thu, 10 Oct 2024 12:30:14 GMT
Title: Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees
Authors: Zhaosong Lu, Sanyou Mei, Yifeng Xiao,
Abstract summary: Existing methods typically aim to find an $epsilon$-stochastic stationary point, where the expected violations of both constraints and first-order stationarity are within a prescribed accuracy. In many practical applications, it is crucial that the constraints be nearly satisfied with certainty, making such an $epsilon$-stochastic stationary point potentially undesirable.
Score: 1.2562458634975162
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study a class of deterministically constrained stochastic optimization problems. Existing methods typically aim to find an $\epsilon$-stochastic stationary point, where the expected violations of both constraints and first-order stationarity are within a prescribed accuracy $\epsilon$. However, in many practical applications, it is crucial that the constraints be nearly satisfied with certainty, making such an $\epsilon$-stochastic stationary point potentially undesirable due to the risk of significant constraint violations. To address this issue, we propose single-loop variance-reduced stochastic first-order methods, where the stochastic gradient of the stochastic component is computed using either a truncated recursive momentum scheme or a truncated Polyak momentum scheme for variance reduction, while the gradient of the deterministic component is computed exactly. Under the error bound condition with a parameter $\theta \geq 1$ and other suitable assumptions, we establish that these methods respectively achieve a sample and first-order operation complexity of $\widetilde O(\epsilon^{-\max\{\theta+2, 2\theta\}})$ and $\widetilde O(\epsilon^{-\max\{4, 2\theta\}})$ for finding a stronger $\epsilon$-stochastic stationary point, where the constraint violation is within $\epsilon$ with certainty, and the expected violation of first-order stationarity is within $\epsilon$. For $\theta=1$, these complexities reduce to $\widetilde O(\epsilon^{-3})$ and $\widetilde O(\epsilon^{-4})$ respectively, which match, up to a logarithmic factor, the best-known complexities achieved by existing methods for finding an $\epsilon$-stochastic stationary point of unconstrained smooth stochastic optimization problems.

Related papers

A single-loop SPIDER-type stochastic subgradient method for expectation-constrained nonconvex nonsmooth optimization [17.25924791071807]
We present a novel type of subgradient algorithm for complex constraints. We show that our method is significantly faster than two-of-the-art algorithms.
arXiv Detail & Related papers (2025-01-31T15:18:52Z)
Stochastic First-Order Methods with Non-smooth and Non-Euclidean Proximal Terms for Nonconvex High-Dimensional Stochastic Optimization [2.0657831823662574]
When the non problem is by which the non problem is by whichity, the sample of first-order methods may depend linearly on the problem dimension, is for undesirable problems. Our algorithms allow for the estimate of complexity using the distance of. mathO (log d) / EuM4. We prove that DISFOM can sharpen variance employing $mathO (log d) / EuM4.
arXiv Detail & Related papers (2024-06-27T18:38:42Z)
On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation [13.813242559935732]
We show first-order algorithms for solving Bilevel Optimization problems. In particular, we show a strong connection between the penalty function and the hyper-objective. We show an improved oracle-complexity of $O(epsilon-3)$ and $O(epsilon-5)$, respectively.
arXiv Detail & Related papers (2023-09-04T18:25:43Z)
A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization. In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z)
Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points. Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z)
Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization [52.25843977506935]
We propose an adaptive variance method, called AdaSpider, for $L$-smooth, non-reduction functions with a finitesum structure. In doing so, we are able to compute an $epsilon-stationary point with $tildeOleft + st/epsilon calls.
arXiv Detail & Related papers (2022-11-03T14:41:46Z)
Stochastic Zeroth order Descent with Structured Directions [10.604744518360464]
We introduce and analyze Structured Zeroth order Descent (SSZD), a finite difference approach that approximates a gradient on a set $lleq d directions, where $d is the dimension of the ambient space. For convex convex we prove almost sure convergence of functions on $O( (d/l) k-c1/2$)$ for every $c1/2$, which is arbitrarily close to the one of the Gradient Descent (SGD) in terms of one number of iterations.
arXiv Detail & Related papers (2022-06-10T14:00:06Z)
A Projection-free Algorithm for Constrained Stochastic Multi-level Composition Optimization [12.096252285460814]
We propose a projection-free conditional gradient-type algorithm for composition optimization. We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
arXiv Detail & Related papers (2022-02-09T06:05:38Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations [54.42518331209581]
We find an algorithm which finds. epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using. $(epsilon,gamma)$surimate random random points. Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z)
Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions. In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.