Related papers: Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

URL: http://arxiv.org/abs/2006.08916v1
Date: Tue, 16 Jun 2020 04:26:50 GMT
Title: Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms
Authors: Guy Bresler, Prateek Jain, Dheeraj Nagaraj, Praneeth Netrapalli and Xian Wu
Abstract summary: We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$. We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
Score: 69.45237691598774
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $\tau_{\mathsf{mix}}$, the mixing time of the underlying Markov chain, under different noise settings. Our results establish that in general, optimization with Markovian data is strictly harder than optimization with independent data and a trivial algorithm (SGD-DD) that works with only one in every $\tilde{\Theta}(\tau_{\mathsf{mix}})$ samples, which are approximately independent, is minimax optimal. In fact, it is strictly better than the popular Stochastic Gradient Descent (SGD) method with constant step-size which is otherwise minimax optimal in the regression with independent data setting. Beyond a worst case analysis, we investigate whether structured datasets seen in practice such as Gaussian auto-regressive dynamics can admit more efficient optimization schemes. Surprisingly, even in this specific and natural setting, Stochastic Gradient Descent (SGD) with constant step-size is still no better than SGD-DD. Instead, we propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate. Our improved rate serves as one of the first results where an algorithm outperforms SGD-DD on an interesting Markov chain and also provides one of the first theoretical analyses to support the use of experience replay in practice.

Related papers

Learning Curves of Stochastic Gradient Descent in Kernel Regression [7.063108005500741]
We analyze the single-pass Gradient Descent (SGD) in kernel regression under source condition.<n>Surprisingly, we show that SGD achieves min-max optimal rates up to constants among all the scales.<n>The main reason for SGD to overcome the curse of saturation is the exponentially decaying step size schedule.
arXiv Detail & Related papers (2025-05-28T07:16:11Z)
On the Performance of Empirical Risk Minimization with Smoothed Data [59.3428024282545]
Empirical Risk Minimization (ERM) is able to achieve sublinear error whenever a class is learnable with iid data. We show that ERM is able to achieve sublinear error whenever a class is learnable with iid data.
arXiv Detail & Related papers (2024-02-22T21:55:41Z)
Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD) We use gradient new outliers as norm clipping chains. We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z)
Statistical Inference with Stochastic Gradient Methods under $\phi$-mixing Data [9.77185962310918]
We propose a mini-batch SGD estimator for statistical inference when the data is $phi$-mixing. The confidence intervals are constructed using an associated mini-batch SGD procedure. The proposed method is memory-efficient and easy to implement in practice.
arXiv Detail & Related papers (2023-02-24T16:16:43Z)
SketchySGD: Reliable Stochastic Optimization via Randomized Curvature Estimates [19.420605210427635]
SketchySGD improves upon existing gradient methods in machine learning by using randomized low-rank approximations to the subsampled Hessian. We show theoretically that SketchySGD with a fixed stepsize converges linearly to a small ball around the optimum. In the ill-conditioned setting we show SketchySGD converges at a faster rate than SGD for least-squares problems.
arXiv Detail & Related papers (2022-11-16T01:05:41Z)
Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD) Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Robust Regression Revisited: Acceleration and Improved Estimation Rates [25.54653340884806]
We study fast algorithms for statistical regression problems under the strong contamination model. The goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. We present nearly-linear time algorithms for robust regression problems with improved runtime or estimation guarantees.
arXiv Detail & Related papers (2021-06-22T17:21:56Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime. We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z)
Quasi-Newton Solver for Robust Non-Rigid Registration [35.66014845211251]
We propose a formulation for robust non-rigid registration based on a globally smooth robust estimator for data fitting and regularization. We apply the majorization-minimization algorithm to the problem, which reduces each iteration to solving a simple least-squares problem with L-BFGS.
arXiv Detail & Related papers (2020-04-09T01:45:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.