Online covariance estimation for stochastic gradient descent under
Markovian sampling
- URL: http://arxiv.org/abs/2308.01481v2
- Date: Sun, 5 Nov 2023 04:12:57 GMT
- Title: Online covariance estimation for stochastic gradient descent under
Markovian sampling
- Authors: Abhishek Roy, Krishnakumar Balasubramanian
- Abstract summary: Convergence rates of order $Obig(sqrtd,n-1/8(log n)1/4big)$ are established under state-dependent and state-independent Markovian sampling.
Our method is applied to the strategic classification with logistic regression, where adversaries adaptively modify features during training to affect target class classification.
- Score: 20.02012768403544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the online overlapping batch-means covariance estimator for
Stochastic Gradient Descent (SGD) under Markovian sampling. Convergence rates
of order $O\big(\sqrt{d}\,n^{-1/8}(\log n)^{1/4}\big)$ and
$O\big(\sqrt{d}\,n^{-1/8}\big)$ are established under state-dependent and
state-independent Markovian sampling, respectively, where $d$ is the
dimensionality and $n$ denotes observations or SGD iterations. These rates
match the best-known convergence rate for independent and identically
distributed (i.i.d) data. Our analysis overcomes significant challenges that
arise due to Markovian sampling, leading to the introduction of additional
error terms and complex dependencies between the blocks of the batch-means
covariance estimator. Moreover, we establish the convergence rate for the first
four moments of the $\ell_2$ norm of the error of SGD dynamics under
state-dependent Markovian data, which holds potential interest as an
independent result. Numerical illustrations provide confidence intervals for
SGD in linear and logistic regression models under Markovian sampling.
Additionally, our method is applied to the strategic classification with
logistic regression, where adversaries adaptively modify features during
training to affect target class classification.
Related papers
- Markov Chain Variance Estimation: A Stochastic Approximation Approach [14.883782513177094]
We consider the problem of estimating the variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean.
We design a novel recursive estimator that requires $O(1)$ at each step, does not require any historical samples or any prior knowledge of run-length, and has optimal $O(frac1n) rate of convergence for the mean-squared error (MSE) with provable finite sample guarantees.
arXiv Detail & Related papers (2024-09-09T15:42:28Z) - Adaptive Mean Estimation in the Hidden Markov sub-Gaussian Mixture Model [2.07180164747172]
We first study the limitations of existing results in the high dimensional setting and then propose a minimax optimal procedure for the problem of center estimation.
We show that our procedure reaches the optimal rate that is of order $sqrtdelta d/n + d/n$ instead of $sqrtd/n + d/n$ where $delta in(0,1)$ is a dependence parameter between labels.
arXiv Detail & Related papers (2024-06-18T09:48:21Z) - Demystifying SGD with Doubly Stochastic Gradients [13.033133586372612]
We establish the convergence properties of doubly SGD with independent minibatching and random reshuffling under general conditions.
We prove that random reshuffling improves the complexity dependence on the subsampling noise.
arXiv Detail & Related papers (2024-06-03T01:13:19Z) - Delta-AI: Local objectives for amortized inference in sparse graphical models [64.5938437823851]
We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs)
Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective.
We illustrate $Delta$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.
arXiv Detail & Related papers (2023-10-03T20:37:03Z) - Generalized equivalences between subsampling and ridge regularization [3.1346887720803505]
We prove structural and risk equivalences between subsampling and ridge regularization for ensemble ridge estimators.
An indirect implication of our equivalences is that optimally tuned ridge regression exhibits a monotonic prediction risk in the data aspect ratio.
arXiv Detail & Related papers (2023-05-29T14:05:51Z) - Policy evaluation from a single path: Multi-step methods, mixing and
mis-specification [45.88067550131531]
We study non-parametric estimation of the value function of an infinite-horizon $gamma$-discounted Markov reward process.
We provide non-asymptotic guarantees for a general family of kernel-based multi-step temporal difference estimates.
arXiv Detail & Related papers (2022-11-07T23:15:25Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z) - CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator [60.799183326613395]
We propose an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples.
CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling.
We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline.
arXiv Detail & Related papers (2021-10-26T20:14:30Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.