Provably Auditing Ordinary Least Squares in Low Dimensions
- URL: http://arxiv.org/abs/2205.14284v1
- Date: Sat, 28 May 2022 00:45:10 GMT
- Title: Provably Auditing Ordinary Least Squares in Low Dimensions
- Authors: Ankur Moitra and Dhruv Rohatgi
- Abstract summary: Most metrics measure stability of conclusions derived from Ordinary Least Squares linear regression.
Recent work proposes a simple, global, finite-sample stability metric: the minimum number of samples that need to be removed so that rerunning the analysis overturns the conclusion.
We show that in the low-dimensional regime where the number of co variables is a constant but the number of samples is large, there are efficient algorithms for provably estimating this metric.
- Score: 17.655785504931913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Measuring the stability of conclusions derived from Ordinary Least Squares
linear regression is critically important, but most metrics either only measure
local stability (i.e. against infinitesimal changes in the data), or are only
interpretable under statistical assumptions. Recent work proposes a simple,
global, finite-sample stability metric: the minimum number of samples that need
to be removed so that rerunning the analysis overturns the conclusion,
specifically meaning that the sign of a particular coefficient of the estimated
regressor changes. However, besides the trivial exponential-time algorithm, the
only approach for computing this metric is a greedy heuristic that lacks
provable guarantees under reasonable, verifiable assumptions; the heuristic
provides a loose upper bound on the stability and also cannot certify lower
bounds on it.
We show that in the low-dimensional regime where the number of covariates is
a constant but the number of samples is large, there are efficient algorithms
for provably estimating (a fractional version of) this metric. Applying our
algorithms to the Boston Housing dataset, we exhibit regression analyses where
we can estimate the stability up to a factor of $3$ better than the greedy
heuristic, and analyses where we can certify stability to dropping even a
majority of the samples.
Related papers
- Exact Mean Square Linear Stability Analysis for SGD [28.65663421598186]
We provide an explicit condition on the step size that is both necessary and sufficient for linear stability of gradient descent (SGD)
We show that SGD's stability threshold is equivalent to that of a mixture process which takes in each a full batch gradient step w.p. $1-p$, and a single sample gradient step w.p. $p$.
arXiv Detail & Related papers (2023-06-13T15:29:23Z) - Convergence of uncertainty estimates in Ensemble and Bayesian sparse
model discovery [4.446017969073817]
We show empirical success in terms of accuracy and robustness to noise with bootstrapping-based sequential thresholding least-squares estimator.
We show that this bootstrapping-based ensembling technique can perform a provably correct variable selection procedure with an exponential convergence rate of the error rate.
arXiv Detail & Related papers (2023-01-30T04:07:59Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Toward Better Generalization Bounds with Locally Elastic Stability [41.7030651617752]
We argue that locally elastic stability implies tighter generalization bounds than those derived based on uniform stability.
We revisit the examples of bounded support vector machines, regularized least square regressions, and gradient descent.
arXiv Detail & Related papers (2020-10-27T02:04:53Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - Outlier Robust Mean Estimation with Subgaussian Rates via Stability [46.03021473600576]
We study the problem of robust outlier high-dimensional mean estimation.
We obtain first computationally efficient rate with subgaussian for outlier-robust mean estimation.
arXiv Detail & Related papers (2020-07-30T17:33:03Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Finite-time Identification of Stable Linear Systems: Optimality of the
Least-Squares Estimator [79.3239137440876]
We present a new finite-time analysis of the estimation error of the Ordinary Least Squares (OLS) estimator for stable linear time-invariant systems.
We characterize the number of observed samples sufficient for the OLS estimator to be $(varepsilon,delta)$-PAC, i.e., to yield an estimation error less than $varepsilon$ with probability at least $1-delta$.
arXiv Detail & Related papers (2020-03-17T20:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.