Related papers: Inexact subgradient methods for semialgebraic functions

Inexact subgradient methods for semialgebraic functions

URL: http://arxiv.org/abs/2404.19517v2
Date: Tue, 13 May 2025 13:35:00 GMT
Title: Inexact subgradient methods for semialgebraic functions
Authors: Jérôme Bolte, Tam Le, Éric Moulines, Edouard Pauwels,
Abstract summary: Motivated by the extensive application of approximate gradients in machine learning, we investigate subexact additive methods subject to persistent errors.<n>Our analysis addresses both vanishing and constant step-size regimes.
Score: 18.293072574300798
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivated by the extensive application of approximate gradients in machine learning and optimization, we investigate inexact subgradient methods subject to persistent additive errors. Within a nonconvex semialgebraic framework, assuming boundedness or coercivity, we establish that the method yields iterates that eventually fluctuate near the critical set at a proximity characterized by an $O(\epsilon^\rho)$ distance, where $\epsilon$ denotes the magnitude of subgradient evaluation errors, and $\rho$ encapsulates geometric characteristics of the underlying problem. Our analysis comprehensively addresses both vanishing and constant step-size regimes. Notably, the latter regime inherently enlarges the fluctuation region, yet this enlargement remains on the order of $\epsilon^\rho$. In the convex scenario, employing a universal error bound applicable to coercive semialgebraic functions, we derive novel complexity results concerning averaged iterates. Additionally, our study produces auxiliary results of independent interest, including descent-type lemmas for nonsmooth nonconvex functions and an invariance principle governing the behavior of algorithmic sequences under small-step limits.

Related papers

Quantitative Error Bounds for Scaling Limits of Stochastic Iterative Algorithms [10.022615790746466]
We derive non-asymptotic functional approximation error bounds between the algorithm sample paths and the Ornstein-Uhlenbeck approximation. We use our main result to construct error bounds in terms of two common metrics: the L'evy-Prokhorov and bounded Wasserstein distances.
arXiv Detail & Related papers (2025-01-21T15:29:11Z)
Stochastic First-Order Methods with Non-smooth and Non-Euclidean Proximal Terms for Nonconvex High-Dimensional Stochastic Optimization [2.0657831823662574]
When the non problem is by which the non problem is by whichity, the sample of first-order methods may depend linearly on the problem dimension, is for undesirable problems. Our algorithms allow for the estimate of complexity using the distance of. mathO (log d) / EuM4. We prove that DISFOM can sharpen variance employing $mathO (log d) / EuM4.
arXiv Detail & Related papers (2024-06-27T18:38:42Z)
Convergence and Complexity Guarantee for Inexact First-order Riemannian Optimization Algorithms [18.425648833592312]
We show that tBMM converges to an $ilon$-stationary point within $O(epsilon-2)$. Under a mild iteration, the results still hold when the subproblem is solved in iterations in each provided the total optimality gap is bounded.
arXiv Detail & Related papers (2024-05-05T22:53:14Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks. We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z)
Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1. This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z)
A Unified Analysis on the Subgradient Upper Bounds for the Subgradient Methods Minimizing Composite Nonconvex, Nonsmooth and Non-Lipschitz Functions [7.972544890243396]
This paper presents a unified analysis for the proximal subgradient method (Prox-SubGrad) type approach. We are able to relate error-bound conditions, the growth conditions of the subgradients of the objective, and the behavior of the proximal subgradient iterates on some remarkably broad classes of objective functions.
arXiv Detail & Related papers (2023-08-30T23:34:11Z)
Curvature-Independent Last-Iterate Convergence for Games on Riemannian Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
Decentralized Weakly Convex Optimization Over the Stiefel Manifold [28.427697270742947]
We focus on the Stiefel manifold in the decentralized setting, where a connected network of agents in $nMn log-1)$ are tested. We propose an method called the decentralized subgradient method (DRSM)$ for forcing a natural station below $nMn log-1)$.
arXiv Detail & Related papers (2023-03-31T02:56:23Z)
High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z)
Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points. Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z)
A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization [10.820943271350442]
We present a convex gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples. We also show that our algorithm avoids the ratelibria, implying convergence to local minima.
arXiv Detail & Related papers (2022-07-30T18:50:36Z)
Randomized Coordinate Subgradient Method for Nonsmooth Composite Optimization [11.017632675093628]
Coordinate-type subgradient methods for addressing nonsmooth problems are relatively underexplored due to the set of properties of the Lipschitz-type assumption.
arXiv Detail & Related papers (2022-06-30T02:17:11Z)
Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset. Many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z)
On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging. We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)
Stochastic Zeroth-order Riemannian Derivative Estimation and Optimization [15.78743548731191]
We propose an oracle version of the Gaussian smoothing function to overcome the difficulty of non-linearity of manifold non-linearity. We demonstrate the applicability of our algorithms by results and real-world applications on black-box stiffness control for robotics and black-box attacks to neural networks.
arXiv Detail & Related papers (2020-03-25T06:58:19Z)
Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles [8.655294504286635]
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error. Our proposed oracles are in practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed simulation.
arXiv Detail & Related papers (2020-02-26T12:53:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.