Entrywise Inference for Missing Panel Data: A Simple and Instance-Optimal Approach
- URL: http://arxiv.org/abs/2401.13665v2
- Date: Mon, 1 Jul 2024 17:10:28 GMT
- Title: Entrywise Inference for Missing Panel Data: A Simple and Instance-Optimal Approach
- Authors: Yuling Yan, Martin J. Wainwright,
- Abstract summary: We consider inferential questions associated with the missing data version of panel data induced by staggered adoption.
We develop and analyze a data-driven procedure for constructing entrywise confidence intervals with pre-specified coverage.
We prove non-asymptotic and high-probability bounds on its error in estimating each missing entry.
- Score: 27.301741710016223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Longitudinal or panel data can be represented as a matrix with rows indexed by units and columns indexed by time. We consider inferential questions associated with the missing data version of panel data induced by staggered adoption. We propose a computationally efficient procedure for estimation, involving only simple matrix algebra and singular value decomposition, and prove non-asymptotic and high-probability bounds on its error in estimating each missing entry. By controlling proximity to a suitably scaled Gaussian variable, we develop and analyze a data-driven procedure for constructing entrywise confidence intervals with pre-specified coverage. Despite its simplicity, our procedure turns out to be instance-optimal: we prove that the width of our confidence intervals match a non-asymptotic instance-wise lower bound derived via a Bayesian Cram\'{e}r-Rao argument. We illustrate the sharpness of our theoretical characterization on a variety of numerical examples. Our analysis is based on a general inferential toolbox for SVD-based algorithm applied to the matrix denoising model, which might be of independent interest.
Related papers
- A general error analysis for randomized low-rank approximation with application to data assimilation [42.57210316104905]
We propose a framework for the analysis of the low-rank approximation error in Frobenius norm for centered and non-standard matrices.
Under minimal assumptions, we derive accurate bounds in expectation and probability.
Our bounds have clear interpretations that enable us to derive properties and motivate practical choices.
arXiv Detail & Related papers (2024-05-08T04:51:56Z) - Inference in Randomized Least Squares and PCA via Normality of Quadratic Forms [19.616162116973637]
We develop a unified methodology for statistical inference via randomized sketching or projections.
The methodology applies to fixed datasets -- i.e., is data-conditional -- and the only randomness is due to the randomized algorithm.
arXiv Detail & Related papers (2024-04-01T04:35:44Z) - Statistical Inference For Noisy Matrix Completion Incorporating Auxiliary Information [3.9748528039819977]
This paper investigates statistical inference for noisy matrix completion in a semi-supervised model.
We apply an iterative least squares (LS) estimation approach in our considered context.
We show that our method only needs a few iterations, and the resulting entry-wise estimators of the low-rank matrix and the coefficient matrix are guaranteed to have normal distributions.
arXiv Detail & Related papers (2024-03-22T01:06:36Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Robust Matrix Completion with Mixed Data Types [0.0]
We consider the problem of recovering a structured low rank matrix with partially observed entries with mixed data types.
Most approaches assume that there is only one underlying distribution and the low rank constraint is regularized by the matrix Schatten Norm.
We propose a computationally feasible statistical approach with strong recovery guarantees along with an algorithmic framework suited for parallelization to recover a low rank matrix with partially observed entries for mixed data types in one step.
arXiv Detail & Related papers (2020-05-25T21:35:10Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - A Robust Functional EM Algorithm for Incomplete Panel Count Data [66.07942227228014]
We propose a functional EM algorithm to estimate the counting process mean function under a missing completely at random assumption (MCAR)
The proposed algorithm wraps several popular panel count inference methods, seamlessly deals with incomplete counts and is robust to misspecification of the Poisson process assumption.
We illustrate the utility of the proposed algorithm through numerical experiments and an analysis of smoking cessation data.
arXiv Detail & Related papers (2020-03-02T20:04:38Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z) - Statistical Outlier Identification in Multi-robot Visual SLAM using
Expectation Maximization [18.259478519717426]
This paper introduces a novel and distributed method for detecting inter-map loop closure outliers in simultaneous localization and mapping (SLAM)
The proposed algorithm does not rely on a good initialization and can handle more than two maps at a time.
arXiv Detail & Related papers (2020-02-07T06:34:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.