Related papers: On the Uniform Convergence of Subdifferentials in Stochastic Optimization and Learning

On the Uniform Convergence of Subdifferentials in Stochastic Optimization and Learning

URL: http://arxiv.org/abs/2405.10289v5
Date: Fri, 11 Jul 2025 20:36:30 GMT
Title: On the Uniform Convergence of Subdifferentials in Stochastic Optimization and Learning
Authors: Feng Ruan,
Abstract summary: We investigate the uniform convergence of subdifferential mappings from empirical risk to population risk in nonsmooth, non-valued to deterministic optimization.<n>These guarantees offer new insight into the geometry of problems arising in robust statistics and related applications.
Score: 2.719510212909501
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the uniform convergence of subdifferential mappings from empirical risk to population risk in nonsmooth, nonconvex stochastic optimization. This question is key to understanding how empirical stationary points approximate population ones, yet characterizing this convergence remains a fundamental challenge due to the set-valued and nonsmooth nature of subdifferentials. This work establishes a general reduction principle: for weakly convex stochastic objectives, over any open subset of the domain, we show that a uniform bound on the convergence of selected subgradients-chosen arbitrarily from subdifferential sets-yields a corresponding uniform bound on the Hausdorff distance between the subdifferentials. This deterministic result reduces the study of set-valued subdifferential convergence to simpler vector-valued subgradient convergence. We apply this reduction to derive sharp uniform convergence rates for subdifferential mappings in stochastic convex-composite optimization, without relying on differentiability assumptions on the population risk. These guarantees clarify the landscape of nonsmooth empirical objectives and offer new insight into the geometry of optimization problems arising in robust statistics and related applications.

Related papers

Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness [50.78508362183774]
Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance.<n>Most require the Lipschitz condition, which is often not met in common machine learning schemes.
arXiv Detail & Related papers (2025-07-11T15:36:48Z)
Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias [55.72269695392027]
This paper focuses on applying entropic mirror descent to solve linear systems.<n>The main challenge for the convergence analysis stems from the unboundedness of the domain.<n>To overcome this without imposing restrictive assumptions, we introduce a variant of Polyak-type stepsizes.
arXiv Detail & Related papers (2025-05-05T12:33:18Z)
Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two. Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z)
A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set [20.166217494056916]
We propose a principled approach to construct covariance estimators without imposing restrictive assumptions. We show that our robust estimators are efficiently computable and consistent. Numerical experiments based on synthetic and real data show that our robust estimators are competitive with state-of-the-art estimators.
arXiv Detail & Related papers (2024-05-30T15:01:18Z)
A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning. This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)
High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z)
Stochastic Saddle Point Problems with Decision-Dependent Distributions [0.6091702876917279]
This paper focuses on saddle point problems with decision-dependent in both the static and time-varying settings. We introduce the notion of equilibrium points -- which are saddle points for the stationary minimax problem. We show that primal-dual algorithms converge to saddle points in a similar fashion.
arXiv Detail & Related papers (2022-01-07T03:36:41Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
The Last-Iterate Convergence Rate of Optimistic Mirror Descent in Stochastic Variational Inequalities [29.0058976973771]
We show an intricate relation between the algorithm's rate of convergence and the local geometry induced by the method's underlying Bregman function. We show that this exponent determines both the optimal step-size policy of the algorithm and the optimal rates attained.
arXiv Detail & Related papers (2021-07-05T09:54:47Z)
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size. Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
Sequential Estimation of Convex Divergences using Reverse Submartingales and Exchangeable Filtrations [31.088836418378534]
We present a unified technique for sequential estimation of convex divergences between distributions. The technical underpinnings of our approach lie in the observation that empirical convex divergences are (partially ordered) reverse submartingales. These techniques appear to be powerful additions to the existing literature on both confidence sequences and convex divergences.
arXiv Detail & Related papers (2021-03-16T18:22:14Z)
Stochastic Variance Reduction for Variational Inequality Methods [19.061953585686986]
We propose variance reduced algorithms for solving convex-concave saddle point problems, monotone variational inequalities, and monotone inclusions. Our framework applies to extragradient, forward-backward-forward, and forward-reflected-backward methods both in Euclidean and Bregman.
arXiv Detail & Related papers (2021-02-16T18:39:16Z)
Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method. The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z)
Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point. The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z)
Variance-Reduced Splitting Schemes for Monotone Stochastic Generalized Equations [0.0]
We consider monotone inclusion problems where the operators may be expectation-valued. A direct application of splitting schemes is complicated by the need to resolve problems with expectation-valued maps at each step. We propose an avenue for addressing uncertainty in the mapping: Variance-reduced modified forward-backward splitting scheme.
arXiv Detail & Related papers (2020-08-26T02:33:27Z)
The Convergence Indicator: Improved and completely characterized parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge. Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.