Bias and Extrapolation in Markovian Linear Stochastic Approximation with
Constant Stepsizes
- URL: http://arxiv.org/abs/2210.00953v3
- Date: Mon, 21 Aug 2023 17:23:31 GMT
- Title: Bias and Extrapolation in Markovian Linear Stochastic Approximation with
Constant Stepsizes
- Authors: Dongyan Huo, Yudong Chen, Qiaomin Xie
- Abstract summary: We consider Linear Approximation (LSA) with a constant stepsize and Markovian data.
We show that the bias vector of this limit admits an infinite series expansion with respect to the stepsize.
We show that the bias can be reduced using Richardson-Romberg extrapolation with $mge 2$ stepsizes.
- Score: 9.689344942945652
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider Linear Stochastic Approximation (LSA) with a constant stepsize
and Markovian data. Viewing the joint process of the data and LSA iterate as a
time-homogeneous Markov chain, we prove its convergence to a unique limiting
and stationary distribution in Wasserstein distance and establish
non-asymptotic, geometric convergence rates. Furthermore, we show that the bias
vector of this limit admits an infinite series expansion with respect to the
stepsize. Consequently, the bias is proportional to the stepsize up to higher
order terms. This result stands in contrast with LSA under i.i.d. data, for
which the bias vanishes. In the reversible chain setting, we provide a general
characterization of the relationship between the bias and the mixing time of
the Markovian data, establishing that they are roughly proportional to each
other.
While Polyak-Ruppert tail-averaging reduces the variance of the LSA iterates,
it does not affect the bias. The above characterization allows us to show that
the bias can be reduced using Richardson-Romberg extrapolation with $m\ge 2$
stepsizes, which eliminates the $m-1$ leading terms in the bias expansion. This
extrapolation scheme leads to an exponentially smaller bias and an improved
mean squared error, both in theory and empirically. Our results immediately
apply to the Temporal Difference learning algorithm with linear function
approximation, Markovian data, and constant stepsizes.
Related papers
- Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way [12.331596909999764]
We investigate it constant stpesize schemes through the lens of Markov processes.
We derive explicit geometric and non-asymptotic convergence rates, as well as the variance and bias introduced by constant stepsizes.
arXiv Detail & Related papers (2024-10-16T21:49:27Z) - Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation [22.652143194356864]
We address the problem of solving strongly convex and smooth problems using a descent gradient (SGD) algorithm with a constant step size.
We provide an expansion of the mean-squared error of the resulting estimator with respect to the number iterations of $n$.
We establish that this chain is geometrically ergodic with respect to a defined weighted Wasserstein semimetric.
arXiv Detail & Related papers (2024-10-07T15:02:48Z) - Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - Effectiveness of Constant Stepsize in Markovian LSA and Statistical
Inference [9.689344942945652]
We study the effectiveness of using a constant stepsize in statistical inference via linear approximation (LSA) algorithms with Markovian data.
Our results show that using a constant stepsize enjoys easy hyper parameter tuning, fast convergence, and consistently better CI coverage, especially when data is limited.
arXiv Detail & Related papers (2023-12-18T02:51:57Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Convergence and concentration properties of constant step-size SGD
through Markov chains [0.0]
We consider the optimization of a smooth and strongly convex objective using constant step-size gradient descent (SGD)
We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance.
All our results are non-asymptotic and their consequences are discussed through a few applications.
arXiv Detail & Related papers (2023-06-20T12:36:28Z) - Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence
Embedding [51.48582649050054]
We propose a representation normalization method which aims at disentangling the correlations between features of encoded sentences.
We also propose Kernel-Whitening, a Nystrom kernel approximation method to achieve more thorough debiasing on nonlinear spurious correlations.
Experiments show that Kernel-Whitening significantly improves the performance of BERT on out-of-distribution datasets while maintaining in-distribution accuracy.
arXiv Detail & Related papers (2022-10-14T05:56:38Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.