Related papers: Sharper Analysis for Minibatch Stochastic Proximal Point Methods: Stability, Smoothness, and Deviation

Sharper Analysis for Minibatch Stochastic Proximal Point Methods: Stability, Smoothness, and Deviation

URL: http://arxiv.org/abs/2301.03125v1
Date: Mon, 9 Jan 2023 00:13:34 GMT
Title: Sharper Analysis for Minibatch Stochastic Proximal Point Methods: Stability, Smoothness, and Deviation
Authors: Xiao-Tong Yuan and Ping Li
Abstract summary: We study a minibatch variant of proximal point (SPP) methods, namely M-SPP, for solving convex composite risk minimization problems. We show that M-SPP with minibatch-size $n$ and quadratic count $T$ enjoys an in-expectation fast rate of convergence. In the small-$n$-large-$T$ setting, this result substantially improves the best known results of SPP-type approaches.
Score: 41.082982732100696
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The stochastic proximal point (SPP) methods have gained recent attention for stochastic optimization, with strong convergence guarantees and superior robustness to the classic stochastic gradient descent (SGD) methods showcased at little to no cost of computational overhead added. In this article, we study a minibatch variant of SPP, namely M-SPP, for solving convex composite risk minimization problems. The core contribution is a set of novel excess risk bounds of M-SPP derived through the lens of algorithmic stability theory. Particularly under smoothness and quadratic growth conditions, we show that M-SPP with minibatch-size $n$ and iteration count $T$ enjoys an in-expectation fast rate of convergence consisting of an $\mathcal{O}\left(\frac{1}{T^2}\right)$ bias decaying term and an $\mathcal{O}\left(\frac{1}{nT}\right)$ variance decaying term. In the small-$n$-large-$T$ setting, this result substantially improves the best known results of SPP-type approaches by revealing the impact of noise level of model on convergence rate. In the complementary small-$T$-large-$n$ regime, we provide a two-phase extension of M-SPP to achieve comparable convergence rates. Moreover, we derive a near-tight high probability (over the randomness of data) bound on the parameter estimation error of a sampling-without-replacement variant of M-SPP. Numerical evidences are provided to support our theoretical predictions when substantialized to Lasso and logistic regression models.

Related papers

A View of the Certainty-Equivalence Method for PAC RL as an Application of the Trajectory Tree Method [5.238591085233903]
This paper presents a theoretical investigation that stems from the surprising finding that CEM may indeed be viewed as an application of TTM.<n>We obtain (3) improvements in the sample-complexity upper bounds for CEM both for non-stationary and stationary MDPs.<n>We also show (4) a lower bound on the sample complexity for finite-horizon MDPs, which establishes the minimax-optimality of our upper bound for non-stationary MDPs.
arXiv Detail & Related papers (2025-01-05T20:37:34Z)
A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization in a Discounted MDP [1.0923877073891446]
We analyze a Temporal Difference (TD) learning algorithm with linear function approximation (LFA) for policy evaluation. We derive finite-sample bounds that hold (i) in the mean-squared sense and (ii) with high probability under tail iterate averaging. These results establish finite-sample theoretical guarantees for risk-sensitive actor-critic methods in reinforcement learning.
arXiv Detail & Related papers (2024-06-12T05:49:53Z)
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance [10.11126899274029]
We propose and explore new Polyak-type variants suitable for the update rule of the Heavy Ball method (SHB) For MomSPS$_max$, we provide guarantees for SHB to a neighborhood of the solution for convex and smooth problems (without assuming) The other two variants, MomDecSPS and MomAdaSPS, are the first adaptive step-sizes for SHB that guarantee convergence to the exact minimizer without prior knowledge.
arXiv Detail & Related papers (2024-06-06T15:08:06Z)
A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning. This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z)
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation [53.17668583030862]
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation. We propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP) We show that LOOP achieves a sublinear $tildemathcalO(mathrmpoly(d, mathrmsp(V*)) sqrtTbeta )$ regret, where $d$ and $beta$ correspond to AGEC and log-covering number of the hypothesis class respectively
arXiv Detail & Related papers (2024-04-19T06:24:22Z)
Diffusion Stochastic Optimization for Min-Max Problems [33.73046548872663]
The optimistic gradient method is useful in addressing minimax optimization problems. Motivated by the observation that the conventional version suffers from the need for a large batch size, we introduce and analyze a new formulation termed Samevareps-generativeOGOG.
arXiv Detail & Related papers (2024-01-26T01:16:59Z)
Efficiently Escaping Saddle Points for Non-Convex Policy Optimization [40.0986936439803]
Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance. We propose a variance-reduced second-order method that uses second-order information in the form of Hessian vector products (HVP) and converges to an approximate second-order stationary point (SOSP) with sample complexity of $tildeO(epsilon-3)$.
arXiv Detail & Related papers (2023-11-15T12:36:45Z)
Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier. Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable. We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z)
A Semismooth Newton Stochastic Proximal Point Algorithm with Variance Reduction [2.048226951354646]
We develop an implementable proximal point (SPP) method for a class of weakly convex, composite optimization problems. The proposed algorithm incorporates a variance reduction mechanism and the resulting updates are solved using an inexact semismooth Newton framework.
arXiv Detail & Related papers (2022-04-01T13:08:49Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
Estimation in Tensor Ising Models [5.161531917413708]
We consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes. In particular, we show the $sqrt N$-consistency of the MPL estimate in the $p$-spin Sherrington-Kirkpatrick (SK) model. We derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie-Weiss model.
arXiv Detail & Related papers (2020-08-29T00:06:58Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.