Entrywise Inference for Missing Panel Data: A Simple and Instance-Optimal Approach
- URL: http://arxiv.org/abs/2401.13665v2
- Date: Mon, 1 Jul 2024 17:10:28 GMT
- Title: Entrywise Inference for Missing Panel Data: A Simple and Instance-Optimal Approach
- Authors: Yuling Yan, Martin J. Wainwright,
- Abstract summary: We consider inferential questions associated with the missing data version of panel data induced by staggered adoption.
We develop and analyze a data-driven procedure for constructing entrywise confidence intervals with pre-specified coverage.
We prove non-asymptotic and high-probability bounds on its error in estimating each missing entry.
- Score: 27.301741710016223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Longitudinal or panel data can be represented as a matrix with rows indexed by units and columns indexed by time. We consider inferential questions associated with the missing data version of panel data induced by staggered adoption. We propose a computationally efficient procedure for estimation, involving only simple matrix algebra and singular value decomposition, and prove non-asymptotic and high-probability bounds on its error in estimating each missing entry. By controlling proximity to a suitably scaled Gaussian variable, we develop and analyze a data-driven procedure for constructing entrywise confidence intervals with pre-specified coverage. Despite its simplicity, our procedure turns out to be instance-optimal: we prove that the width of our confidence intervals match a non-asymptotic instance-wise lower bound derived via a Bayesian Cram\'{e}r-Rao argument. We illustrate the sharpness of our theoretical characterization on a variety of numerical examples. Our analysis is based on a general inferential toolbox for SVD-based algorithm applied to the matrix denoising model, which might be of independent interest.
Related papers
- Entry-Specific Matrix Estimation under Arbitrary Sampling Patterns through the Lens of Network Flows [9.631640936820126]
Matrix completion tackles the task of predicting missing values in a low-rank matrix based on a sparse set of observed entries.
We introduce a matrix completion algorithm based on network flows in the bipartite graph induced by the observation pattern.
Our results show that the minimax squared error for recovery of a particular entry in the matrix is proportional to the effective resistance of the corresponding edge in the graph.
arXiv Detail & Related papers (2024-09-06T02:01:03Z) - Probabilistic Iterative Hard Thresholding for Sparse Learning [2.5782973781085383]
We present an approach towards solving expectation objective optimization problems with cardinality constraints.
We prove convergence of the underlying process, and demonstrate the performance on two Machine Learning problems.
arXiv Detail & Related papers (2024-09-02T18:14:45Z) - Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - Statistical Inference For Noisy Matrix Completion Incorporating Auxiliary Information [3.9748528039819977]
This paper investigates statistical inference for noisy matrix completion in a semi-supervised model.
We apply an iterative least squares (LS) estimation approach in our considered context.
We show that our method only needs a few iterations, and the resulting entry-wise estimators of the low-rank matrix and the coefficient matrix are guaranteed to have normal distributions.
arXiv Detail & Related papers (2024-03-22T01:06:36Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Robust Matrix Completion with Mixed Data Types [0.0]
We consider the problem of recovering a structured low rank matrix with partially observed entries with mixed data types.
Most approaches assume that there is only one underlying distribution and the low rank constraint is regularized by the matrix Schatten Norm.
We propose a computationally feasible statistical approach with strong recovery guarantees along with an algorithmic framework suited for parallelization to recover a low rank matrix with partially observed entries for mixed data types in one step.
arXiv Detail & Related papers (2020-05-25T21:35:10Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z) - Statistical Outlier Identification in Multi-robot Visual SLAM using
Expectation Maximization [18.259478519717426]
This paper introduces a novel and distributed method for detecting inter-map loop closure outliers in simultaneous localization and mapping (SLAM)
The proposed algorithm does not rely on a good initialization and can handle more than two maps at a time.
arXiv Detail & Related papers (2020-02-07T06:34:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.