Bias and Extrapolation in Markovian Linear Stochastic Approximation with
Constant Stepsizes
- URL: http://arxiv.org/abs/2210.00953v3
- Date: Mon, 21 Aug 2023 17:23:31 GMT
- Title: Bias and Extrapolation in Markovian Linear Stochastic Approximation with
Constant Stepsizes
- Authors: Dongyan Huo, Yudong Chen, Qiaomin Xie
- Abstract summary: We consider Linear Approximation (LSA) with a constant stepsize and Markovian data.
We show that the bias vector of this limit admits an infinite series expansion with respect to the stepsize.
We show that the bias can be reduced using Richardson-Romberg extrapolation with $mge 2$ stepsizes.
- Score: 9.689344942945652
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider Linear Stochastic Approximation (LSA) with a constant stepsize
and Markovian data. Viewing the joint process of the data and LSA iterate as a
time-homogeneous Markov chain, we prove its convergence to a unique limiting
and stationary distribution in Wasserstein distance and establish
non-asymptotic, geometric convergence rates. Furthermore, we show that the bias
vector of this limit admits an infinite series expansion with respect to the
stepsize. Consequently, the bias is proportional to the stepsize up to higher
order terms. This result stands in contrast with LSA under i.i.d. data, for
which the bias vanishes. In the reversible chain setting, we provide a general
characterization of the relationship between the bias and the mixing time of
the Markovian data, establishing that they are roughly proportional to each
other.
While Polyak-Ruppert tail-averaging reduces the variance of the LSA iterates,
it does not affect the bias. The above characterization allows us to show that
the bias can be reduced using Richardson-Romberg extrapolation with $m\ge 2$
stepsizes, which eliminates the $m-1$ leading terms in the bias expansion. This
extrapolation scheme leads to an exponentially smaller bias and an improved
mean squared error, both in theory and empirically. Our results immediately
apply to the Temporal Difference learning algorithm with linear function
approximation, Markovian data, and constant stepsizes.
Related papers
- Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - Effectiveness of Constant Stepsize in Markovian LSA and Statistical
Inference [9.689344942945652]
We study the effectiveness of using a constant stepsize in statistical inference via linear approximation (LSA) algorithms with Markovian data.
Our results show that using a constant stepsize enjoys easy hyper parameter tuning, fast convergence, and consistently better CI coverage, especially when data is limited.
arXiv Detail & Related papers (2023-12-18T02:51:57Z) - Spectrum-Aware Debiasing: A Modern Inference Framework with Applications to Principal Components Regression [1.342834401139078]
We introduce SpectrumAware Debiasing, a novel method for high-dimensional regression.
Our approach applies to problems with structured, heavy tails, and low-rank structures.
We demonstrate our method through simulated and real data experiments.
arXiv Detail & Related papers (2023-09-14T15:58:30Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Convergence and concentration properties of constant step-size SGD
through Markov chains [0.0]
We consider the optimization of a smooth and strongly convex objective using constant step-size gradient descent (SGD)
We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance.
All our results are non-asymptotic and their consequences are discussed through a few applications.
arXiv Detail & Related papers (2023-06-20T12:36:28Z) - Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence
Embedding [51.48582649050054]
We propose a representation normalization method which aims at disentangling the correlations between features of encoded sentences.
We also propose Kernel-Whitening, a Nystrom kernel approximation method to achieve more thorough debiasing on nonlinear spurious correlations.
Experiments show that Kernel-Whitening significantly improves the performance of BERT on out-of-distribution datasets while maintaining in-distribution accuracy.
arXiv Detail & Related papers (2022-10-14T05:56:38Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Ridge Regression with Frequent Directions: Statistical and Optimization
Perspectives [1.0152838128195465]
We show that acrshortfd can be used in the optimization setting through an iterative scheme which yields high-accuracy solutions.
This improves on randomized approaches which need to compromise the need for a new sketch every iteration with speed of convergence.
arXiv Detail & Related papers (2020-11-06T21:40:38Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.