Finite-Time Decoupled Convergence in Nonlinear Two-Time-Scale Stochastic Approximation
- URL: http://arxiv.org/abs/2401.03893v3
- Date: Tue, 26 Nov 2024 04:07:27 GMT
- Title: Finite-Time Decoupled Convergence in Nonlinear Two-Time-Scale Stochastic Approximation
- Authors: Yuze Han, Xiang Li, Zhihua Zhang,
- Abstract summary: We investigate the potential for finite-time decoupled convergence in nonlinear two-time-scale approximations.
Under a nested local linearity assumption, finite-time decoupled convergence rates can be achieved with suitable step size selection.
- Score: 26.97172212786727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In two-time-scale stochastic approximation (SA), two iterates are updated at varying speeds using different step sizes, with each update influencing the other. Previous studies on linear two-time-scale SA have shown that the convergence rates of the mean-square errors for these updates depend solely on their respective step sizes, a phenomenon termed decoupled convergence. However, achieving decoupled convergence in nonlinear SA remains less understood. Our research investigates the potential for finite-time decoupled convergence in nonlinear two-time-scale SA. We demonstrate that, under a nested local linearity assumption, finite-time decoupled convergence rates can be achieved with suitable step size selection. To derive this result, we conduct a convergence analysis of the matrix cross term between the iterates and leverage fourth-order moment convergence rates to control the higher-order error terms induced by local linearity. Additionally, a numerical example is provided to explore the possible necessity of local linearity for decoupled convergence.
Related papers
- Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations [57.179679246370114]
We identify the distribution of random perturbations that minimizes the estimator's variance as the perturbation stepsize tends to zero.<n>Our findings reveal that such desired perturbations can align directionally with the true gradient, instead of maintaining a fixed length.
arXiv Detail & Related papers (2025-10-22T19:06:39Z) - Gaussian Approximation for Two-Timescale Linear Stochastic Approximation [4.4491311274892436]
We establish algorithms driven by martingale difference or Markov noise.<n>We derive bounds for normal approximation in terms of the convex distance between probability.<n>We also provide the high-order moment bounds for the error of linear TTSA algorithm.
arXiv Detail & Related papers (2025-08-11T12:41:14Z) - A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models [6.734175048463699]
We derive a linear convergence rate for gradient descent for two-layer linear neural networks trained with squared loss.<n>Our convergence analysis not only improves upon prior results but also suggests a better choice for the step size.
arXiv Detail & Related papers (2025-05-16T19:57:22Z) - Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks [0.0]
We study the gradient descent dynamics of a depth-2 linear neural network with a single input and output.
We show that GD converges at an explicit linear rate to a global minimum of the training loss, even with a large stepsize.
arXiv Detail & Related papers (2025-01-15T20:43:36Z) - Decoupled Functional Central Limit Theorems for Two-Time-Scale Stochastic Approximation [28.07082348529648]
In two-time-scale approximations, two iterates are updated at different rates governed by distinct step sizes, with each update influencing the other.
Previous studies have demonstrated that the convergence rates of the error terms for these updates depend solely on their respective step sizes, a property known as decoupled convergence.
Our work fills this gap by establishing decoupled functional central limit theorems for two-time-scale SA, offering a more precise characterization of its behavior.
arXiv Detail & Related papers (2024-12-22T15:43:01Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Efficient Interpretable Nonlinear Modeling for Multiple Time Series [5.448070998907116]
This paper proposes an efficient nonlinear modeling approach for multiple time series.
It incorporates nonlinear interactions among different time-series variables.
Experimental results show that the proposed algorithm improves the identification of the support of the VAR coefficients in a parsimonious manner.
arXiv Detail & Related papers (2023-09-29T11:42:59Z) - Smoothing ADMM for Sparse-Penalized Quantile Regression with Non-Convex
Penalties [8.294148737585543]
This paper investigates concave and clipped quantile regression in the presence of nonsecondary absolute and non-smooth convergence penalties.
We introduce a novel-loop ADM algorithm with an increasing penalty multiplier, named SIAD, specifically for sparse regression.
arXiv Detail & Related papers (2023-09-04T21:48:51Z) - Dynamics of correlation spreading in low-dimensional transverse-field
Ising models [0.0]
We investigate the dynamical spreading of correlations after a quantum quench starting from a magnetically disordered state in the transverse-field Ising model at one (1D) and two spatial dimensions (2D)
We analyze specifically the longitudinal and transverse spin-spin correlation functions at equal time with use of several methods.
Our findings provide useful benchmarks for quantum simulation experiments of correlation spreading and theoretical refinement of the Lieb-Robinson bound in the future.
arXiv Detail & Related papers (2023-01-04T02:02:21Z) - Fixed-Time Convergence for a Class of Nonconvex-Nonconcave Min-Max
Problems [5.787117733071416]
We develop a fixed-time convergent saddle point dynamical system for solving min-max problems.
The proposed method achieves fast compared to any other state method.
arXiv Detail & Related papers (2022-07-26T12:25:05Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - The Connection between Discrete- and Continuous-Time Descriptions of
Gaussian Continuous Processes [60.35125735474386]
We show that discretizations yielding consistent estimators have the property of invariance under coarse-graining'
This result explains why combining differencing schemes for derivatives reconstruction and local-in-time inference approaches does not work for time series analysis of second or higher order differential equations.
arXiv Detail & Related papers (2021-01-16T17:11:02Z) - Sample Complexity Bounds for Two Timescale Value-based Reinforcement
Learning Algorithms [65.09383385484007]
Two timescale approximation (SA) has been widely used in value-based reinforcement learning algorithms.
We study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ algorithms.
arXiv Detail & Related papers (2020-11-10T11:36:30Z) - Nonlinear Two-Time-Scale Stochastic Approximation: Convergence and
Finite-Time Performance [1.52292571922932]
We study the convergence and finite-time analysis of the nonlinear two-time-scale approximation.
In particular, we show that the method achieves a convergence in expectation at a rate $mathcalO (1/k2/3)$, where $k$ is the number of iterations.
arXiv Detail & Related papers (2020-11-03T17:43:39Z) - Random extrapolation for primal-dual coordinate descent [61.55967255151027]
We introduce a randomly extrapolated primal-dual coordinate descent method that adapts to sparsity of the data matrix and the favorable structures of the objective function.
We show almost sure convergence of the sequence and optimal sublinear convergence rates for the primal-dual gap and objective values, in the general convex-concave case.
arXiv Detail & Related papers (2020-07-13T17:39:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.