Related papers: Subgradient Convergence Implies Subdifferential Convergence on Weakly Convex Functions: With Uniform Rates Guarantees

Subgradient Convergence Implies Subdifferential Convergence on Weakly Convex Functions: With Uniform Rates Guarantees

URL: http://arxiv.org/abs/2405.10289v3
Date: Wed, 29 May 2024 17:20:41 GMT
Title: Subgradient Convergence Implies Subdifferential Convergence on Weakly Convex Functions: With Uniform Rates Guarantees
Authors: Feng Ruan,
Abstract summary: In nonsmooth, non-average approximation optimization, understanding the uniform convergence of subdifferential mappings is crucial for analyzing stationary points of samples of risk. This work connects the uniform convergence of subgradient mappings to the empirical convergence of subgradient mappings. We derive uniform convergence rates for subdifferential convex-composites measured by the Hausdorff metric.
Score: 2.719510212909501
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In nonsmooth, nonconvex stochastic optimization, understanding the uniform convergence of subdifferential mappings is crucial for analyzing stationary points of sample average approximations of risk as they approach the population risk. Yet, characterizing this convergence remains a fundamental challenge. This work introduces a novel perspective by connecting the uniform convergence of subdifferential mappings to that of subgradient mappings as empirical risk converges to the population risk. We prove that, for stochastic weakly-convex objectives, and within any open set, a uniform bound on the convergence of subgradients -- chosen arbitrarily from the corresponding subdifferential sets -- translates to a uniform bound on the convergence of the subdifferential sets itself, measured by the Hausdorff metric. Using this technique, we derive uniform convergence rates for subdifferential sets of stochastic convex-composite objectives. Our results do not rely on key distributional assumptions in the literature, which require the population and finite sample subdifferentials to be continuous in the Hausdorff metric, yet still provide tight convergence rates. These guarantees lead to new insights into the nonsmooth landscapes of such objectives within finite samples.

Related papers

Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness [50.78508362183774]
Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance.<n>Most require the Lipschitz condition, which is often not met in common machine learning schemes.
arXiv Detail & Related papers (2025-07-11T15:36:48Z)
Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias [55.72269695392027]
This paper focuses on applying entropic mirror descent to solve linear systems.<n>The main challenge for the convergence analysis stems from the unboundedness of the domain.<n>To overcome this without imposing restrictive assumptions, we introduce a variant of Polyak-type stepsizes.
arXiv Detail & Related papers (2025-05-05T12:33:18Z)
Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two. Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z)
A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set [20.166217494056916]
We propose a principled approach to construct covariance estimators without imposing restrictive assumptions. We show that our robust estimators are efficiently computable and consistent. Numerical experiments based on synthetic and real data show that our robust estimators are competitive with state-of-the-art estimators.
arXiv Detail & Related papers (2024-05-30T15:01:18Z)
A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning. This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)
High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z)
Stochastic Saddle Point Problems with Decision-Dependent Distributions [0.6091702876917279]
This paper focuses on saddle point problems with decision-dependent in both the static and time-varying settings. We introduce the notion of equilibrium points -- which are saddle points for the stationary minimax problem. We show that primal-dual algorithms converge to saddle points in a similar fashion.
arXiv Detail & Related papers (2022-01-07T03:36:41Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
The Last-Iterate Convergence Rate of Optimistic Mirror Descent in Stochastic Variational Inequalities [29.0058976973771]
We show an intricate relation between the algorithm's rate of convergence and the local geometry induced by the method's underlying Bregman function. We show that this exponent determines both the optimal step-size policy of the algorithm and the optimal rates attained.
arXiv Detail & Related papers (2021-07-05T09:54:47Z)
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size. Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
Sequential Estimation of Convex Divergences using Reverse Submartingales and Exchangeable Filtrations [31.088836418378534]
We present a unified technique for sequential estimation of convex divergences between distributions. The technical underpinnings of our approach lie in the observation that empirical convex divergences are (partially ordered) reverse submartingales. These techniques appear to be powerful additions to the existing literature on both confidence sequences and convex divergences.
arXiv Detail & Related papers (2021-03-16T18:22:14Z)
Stochastic Variance Reduction for Variational Inequality Methods [19.061953585686986]
We propose variance reduced algorithms for solving convex-concave saddle point problems, monotone variational inequalities, and monotone inclusions. Our framework applies to extragradient, forward-backward-forward, and forward-reflected-backward methods both in Euclidean and Bregman.
arXiv Detail & Related papers (2021-02-16T18:39:16Z)
Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method. The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z)
Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point. The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z)
Variance-Reduced Splitting Schemes for Monotone Stochastic Generalized Equations [0.0]
We consider monotone inclusion problems where the operators may be expectation-valued. A direct application of splitting schemes is complicated by the need to resolve problems with expectation-valued maps at each step. We propose an avenue for addressing uncertainty in the mapping: Variance-reduced modified forward-backward splitting scheme.
arXiv Detail & Related papers (2020-08-26T02:33:27Z)
The Convergence Indicator: Improved and completely characterized parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge. Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.