Related papers: Bias and Extrapolation in Markovian Linear Stochastic Approximation with Constant Stepsizes

Bias and Extrapolation in Markovian Linear Stochastic Approximation with Constant Stepsizes

URL: http://arxiv.org/abs/2210.00953v3
Date: Mon, 21 Aug 2023 17:23:31 GMT
Title: Bias and Extrapolation in Markovian Linear Stochastic Approximation with Constant Stepsizes
Authors: Dongyan Huo, Yudong Chen, Qiaomin Xie
Abstract summary: We consider Linear Approximation (LSA) with a constant stepsize and Markovian data. We show that the bias vector of this limit admits an infinite series expansion with respect to the stepsize. We show that the bias can be reduced using Richardson-Romberg extrapolation with $mge 2$ stepsizes.
Score: 9.689344942945652
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider Linear Stochastic Approximation (LSA) with a constant stepsize and Markovian data. Viewing the joint process of the data and LSA iterate as a time-homogeneous Markov chain, we prove its convergence to a unique limiting and stationary distribution in Wasserstein distance and establish non-asymptotic, geometric convergence rates. Furthermore, we show that the bias vector of this limit admits an infinite series expansion with respect to the stepsize. Consequently, the bias is proportional to the stepsize up to higher order terms. This result stands in contrast with LSA under i.i.d. data, for which the bias vanishes. In the reversible chain setting, we provide a general characterization of the relationship between the bias and the mixing time of the Markovian data, establishing that they are roughly proportional to each other. While Polyak-Ruppert tail-averaging reduces the variance of the LSA iterates, it does not affect the bias. The above characterization allows us to show that the bias can be reduced using Richardson-Romberg extrapolation with $m\ge 2$ stepsizes, which eliminates the $m-1$ leading terms in the bias expansion. This extrapolation scheme leads to an exponentially smaller bias and an improved mean squared error, both in theory and empirically. Our results immediately apply to the Temporal Difference learning algorithm with linear function approximation, Markovian data, and constant stepsizes.

Related papers

High-Order Error Bounds for Markovian LSA with Richardson-Romberg Extrapolation [5.214413413248683]
We study the bias and high-order error bounds of the Linear Approximation algorithm with Polyak-Ruppert averaging under Markovian noise.<n>We propose a novel decomposition of the bias via a linearization technique.
arXiv Detail & Related papers (2025-08-07T17:02:11Z)
Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way [12.331596909999764]
We investigate it constant stpesize schemes through the lens of Markov processes. We derive explicit geometric and non-asymptotic convergence rates, as well as the variance and bias introduced by constant stepsizes.
arXiv Detail & Related papers (2024-10-16T21:49:27Z)
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation [22.652143194356864]
We address the problem of solving strongly convex and smooth problems using a descent gradient (SGD) algorithm with a constant step size. We provide an expansion of the mean-squared error of the resulting estimator with respect to the number iterations of $n$. We establish that this chain is geometrically ergodic with respect to a defined weighted Wasserstein semimetric.
arXiv Detail & Related papers (2024-10-07T15:02:48Z)
Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint. We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b. We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z)
Effectiveness of Constant Stepsize in Markovian LSA and Statistical Inference [9.689344942945652]
We study the effectiveness of using a constant stepsize in statistical inference via linear approximation (LSA) algorithms with Markovian data. Our results show that using a constant stepsize enjoys easy hyper parameter tuning, fast convergence, and consistently better CI coverage, especially when data is limited.
arXiv Detail & Related papers (2023-12-18T02:51:57Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)
Convergence and concentration properties of constant step-size SGD through Markov chains [0.0]
We consider the optimization of a smooth and strongly convex objective using constant step-size gradient descent (SGD) We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance. All our results are non-asymptotic and their consequences are discussed through a few applications.
arXiv Detail & Related papers (2023-06-20T12:36:28Z)
Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence Embedding [51.48582649050054]
We propose a representation normalization method which aims at disentangling the correlations between features of encoded sentences. We also propose Kernel-Whitening, a Nystrom kernel approximation method to achieve more thorough debiasing on nonlinear spurious correlations. Experiments show that Kernel-Whitening significantly improves the performance of BERT on out-of-distribution datasets while maintaining in-distribution accuracy.
arXiv Detail & Related papers (2022-10-14T05:56:38Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging. We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.