Asymptotic Analysis of Sampling Estimators for Randomized Numerical
Linear Algebra Algorithms
- URL: http://arxiv.org/abs/2002.10526v1
- Date: Mon, 24 Feb 2020 20:34:50 GMT
- Title: Asymptotic Analysis of Sampling Estimators for Randomized Numerical
Linear Algebra Algorithms
- Authors: Ping Ma, Xinlian Zhang, Xin Xing, Jingyi Ma, and Michael W. Mahoney
- Abstract summary: We develop an analysis to derive the distribution of RandNLA sampling estimators for the least-squares problem.
We identify optimal sampling probabilities based on the Asymptotic Mean Squared Error (AMSE) and the Expected Asymptotic Mean Squared Error (EAMSE)
Our theoretical results clarify the role of leverage in the sampling process, and our empirical results demonstrate improvements over existing methods.
- Score: 43.134933182911766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The statistical analysis of Randomized Numerical Linear Algebra (RandNLA)
algorithms within the past few years has mostly focused on their performance as
point estimators. However, this is insufficient for conducting statistical
inference, e.g., constructing confidence intervals and hypothesis testing,
since the distribution of the estimator is lacking. In this article, we develop
an asymptotic analysis to derive the distribution of RandNLA sampling
estimators for the least-squares problem. In particular, we derive the
asymptotic distribution of a general sampling estimator with arbitrary sampling
probabilities. The analysis is conducted in two complementary settings, i.e.,
when the objective of interest is to approximate the full sample estimator or
is to infer the underlying ground truth model parameters. For each setting, we
show that the sampling estimator is asymptotically normally distributed under
mild regularity conditions. Moreover, the sampling estimator is asymptotically
unbiased in both settings. Based on our asymptotic analysis, we use two
criteria, the Asymptotic Mean Squared Error (AMSE) and the Expected Asymptotic
Mean Squared Error (EAMSE), to identify optimal sampling probabilities. Several
of these optimal sampling probability distributions are new to the literature,
e.g., the root leverage sampling estimator and the predictor length sampling
estimator. Our theoretical results clarify the role of leverage in the sampling
process, and our empirical results demonstrate improvements over existing
methods.
Related papers
- Robust Estimation for Kernel Exponential Families with Smoothed Total Variation Distances [2.317910166616341]
In statistical inference, we commonly assume that samples are independent and identically distributed from a probability distribution.
In this paper, we explore the application of GAN-like estimators to a general class of statistical models.
arXiv Detail & Related papers (2024-10-28T05:50:47Z) - A Provably Accurate Randomized Sampling Algorithm for Logistic Regression [2.7930955543692817]
We present a simple, randomized sampling-based algorithm for logistic regression problem.
We prove that accurate approximations can be achieved with a sample whose size is much smaller than the total number of observations.
Overall, our work sheds light on the potential of using randomized sampling approaches to efficiently approximate the estimated probabilities in logistic regression.
arXiv Detail & Related papers (2024-02-26T06:20:28Z) - Mean-Square Analysis of Discretized It\^o Diffusions for Heavy-tailed
Sampling [17.415391025051434]
We analyze the complexity of sampling from a class of heavy-tailed distributions by discretizing a natural class of Ito diffusions associated with weighted Poincar'e inequalities.
Based on a mean-square analysis, we establish the iteration complexity for obtaining a sample whose distribution is $epsilon$ close to the target distribution in the Wasserstein-2 metric.
arXiv Detail & Related papers (2023-03-01T15:16:03Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Maximum sampled conditional likelihood for informative subsampling [4.708378681950648]
Subsampling is a computationally effective approach to extract information from massive data sets when computing resources are limited.
We propose to use the maximum maximum conditional likelihood estimator (MSCLE) based on the sampled data.
arXiv Detail & Related papers (2020-11-11T16:01:17Z) - Distributionally Robust Parametric Maximum Likelihood Estimation [13.09499764232737]
We propose a distributionally robust maximum likelihood estimator that minimizes the worst-case expected log-loss uniformly over a parametric nominal distribution.
Our novel robust estimator also enjoys statistical consistency and delivers promising empirical results in both regression and classification tasks.
arXiv Detail & Related papers (2020-10-11T19:05:49Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains.
One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z) - Estimating Gradients for Discrete Random Variables by Sampling without
Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement.
We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.