Sharper Analysis for Minibatch Stochastic Proximal Point Methods:
Stability, Smoothness, and Deviation
- URL: http://arxiv.org/abs/2301.03125v1
- Date: Mon, 9 Jan 2023 00:13:34 GMT
- Title: Sharper Analysis for Minibatch Stochastic Proximal Point Methods:
Stability, Smoothness, and Deviation
- Authors: Xiao-Tong Yuan and Ping Li
- Abstract summary: We study a minibatch variant of proximal point (SPP) methods, namely M-SPP, for solving convex composite risk minimization problems.
We show that M-SPP with minibatch-size $n$ and quadratic count $T$ enjoys an in-expectation fast rate of convergence.
In the small-$n$-large-$T$ setting, this result substantially improves the best known results of SPP-type approaches.
- Score: 41.082982732100696
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The stochastic proximal point (SPP) methods have gained recent attention for
stochastic optimization, with strong convergence guarantees and superior
robustness to the classic stochastic gradient descent (SGD) methods showcased
at little to no cost of computational overhead added. In this article, we study
a minibatch variant of SPP, namely M-SPP, for solving convex composite risk
minimization problems. The core contribution is a set of novel excess risk
bounds of M-SPP derived through the lens of algorithmic stability theory.
Particularly under smoothness and quadratic growth conditions, we show that
M-SPP with minibatch-size $n$ and iteration count $T$ enjoys an in-expectation
fast rate of convergence consisting of an
$\mathcal{O}\left(\frac{1}{T^2}\right)$ bias decaying term and an
$\mathcal{O}\left(\frac{1}{nT}\right)$ variance decaying term. In the
small-$n$-large-$T$ setting, this result substantially improves the best known
results of SPP-type approaches by revealing the impact of noise level of model
on convergence rate. In the complementary small-$T$-large-$n$ regime, we
provide a two-phase extension of M-SPP to achieve comparable convergence rates.
Moreover, we derive a near-tight high probability (over the randomness of data)
bound on the parameter estimation error of a sampling-without-replacement
variant of M-SPP. Numerical evidences are provided to support our theoretical
predictions when substantialized to Lasso and logistic regression models.
Related papers
- Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance [10.11126899274029]
We propose and explore new Polyak-type variants suitable for the update rule of the Heavy Ball method (SHB)
For MomSPS$_max$, we provide guarantees for SHB to a neighborhood of the solution for convex and smooth problems (without assuming)
The other two variants, MomDecSPS and MomAdaSPS, are the first adaptive step-sizes for SHB that guarantee convergence to the exact minimizer without prior knowledge.
arXiv Detail & Related papers (2024-06-06T15:08:06Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation [53.17668583030862]
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation.
We propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP)
We show that LOOP achieves a sublinear $tildemathcalO(mathrmpoly(d, mathrmsp(V*)) sqrtTbeta )$ regret, where $d$ and $beta$ correspond to AGEC and log-covering number of the hypothesis class respectively
arXiv Detail & Related papers (2024-04-19T06:24:22Z) - Diffusion Stochastic Optimization for Min-Max Problems [33.73046548872663]
The optimistic gradient method is useful in addressing minimax optimization problems.
Motivated by the observation that the conventional version suffers from the need for a large batch size, we introduce and analyze a new formulation termed Samevareps-generativeOGOG.
arXiv Detail & Related papers (2024-01-26T01:16:59Z) - Efficiently Escaping Saddle Points for Non-Convex Policy Optimization [40.0986936439803]
Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance.
We propose a variance-reduced second-order method that uses second-order information in the form of Hessian vector products (HVP) and converges to an approximate second-order stationary point (SOSP) with sample complexity of $tildeO(epsilon-3)$.
arXiv Detail & Related papers (2023-11-15T12:36:45Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - A Semismooth Newton Stochastic Proximal Point Algorithm with Variance Reduction [2.048226951354646]
We develop an implementable proximal point (SPP) method for a class of weakly convex, composite optimization problems.
The proposed algorithm incorporates a variance reduction mechanism and the resulting updates are solved using an inexact semismooth Newton framework.
arXiv Detail & Related papers (2022-04-01T13:08:49Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Estimation in Tensor Ising Models [5.161531917413708]
We consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes.
In particular, we show the $sqrt N$-consistency of the MPL estimate in the $p$-spin Sherrington-Kirkpatrick (SK) model.
We derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie-Weiss model.
arXiv Detail & Related papers (2020-08-29T00:06:58Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.