Matrix Completion with Quantified Uncertainty through Low Rank Gaussian
Copula
- URL: http://arxiv.org/abs/2006.10829v2
- Date: Mon, 18 Jan 2021 23:20:27 GMT
- Title: Matrix Completion with Quantified Uncertainty through Low Rank Gaussian
Copula
- Authors: Yuxuan Zhao, Madeleine Udell
- Abstract summary: This paper proposes a framework for missing value imputation with quantified uncertainty.
The time required to fit the model scales linearly with the number of rows and the number of columns in the dataset.
Empirical results show the method yields state-of-the-art imputation accuracy across a wide range of data types.
- Score: 30.84155327760468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern large scale datasets are often plagued with missing entries. For
tabular data with missing values, a flurry of imputation algorithms solve for a
complete matrix which minimizes some penalized reconstruction error. However,
almost none of them can estimate the uncertainty of its imputations. This paper
proposes a probabilistic and scalable framework for missing value imputation
with quantified uncertainty. Our model, the Low Rank Gaussian Copula, augments
a standard probabilistic model, Probabilistic Principal Component Analysis,
with marginal transformations for each column that allow the model to better
match the distribution of the data. It naturally handles Boolean, ordinal, and
real-valued observations and quantifies the uncertainty in each imputation. The
time required to fit the model scales linearly with the number of rows and the
number of columns in the dataset. Empirical results show the method yields
state-of-the-art imputation accuracy across a wide range of data types,
including those with high rank. Our uncertainty measure predicts imputation
error well: entries with lower uncertainty do have lower imputation error (on
average). Moreover, for real-valued data, the resulting confidence intervals
are well-calibrated.
Related papers
- Robust Online Covariance and Sparse Precision Estimation Under Arbitrary
Data Corruption [1.5850859526672516]
We introduce a modified trimmed-inner-product algorithm to robustly estimate the covariance in an online scenario.
We provide the error-bound and convergence properties of the estimates to the true precision matrix under our algorithms.
arXiv Detail & Related papers (2023-09-16T05:37:28Z) - Theoretical characterization of uncertainty in high-dimensional linear
classification [24.073221004661427]
We show that uncertainty for learning from limited number of samples of high-dimensional input data and labels can be obtained by the approximate message passing algorithm.
We discuss how over-confidence can be mitigated by appropriately regularising, and show that cross-validating with respect to the loss leads to better calibration than with the 0/1 error.
arXiv Detail & Related papers (2022-02-07T15:32:07Z) - Minimax rate of consistency for linear models with missing values [0.0]
Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...).
In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task.
This eventually requires to solve a number of learning tasks, exponential in the number of input features, which makes predictions impossible for current real-world datasets.
arXiv Detail & Related papers (2022-02-03T08:45:34Z) - RIFLE: Imputation and Robust Inference from Low Order Marginals [10.082738539201804]
We develop a statistical inference framework for regression and classification in the presence of missing data without imputation.
Our framework, RIFLE, estimates low-order moments of the underlying data distribution with corresponding confidence intervals to learn a distributionally robust model.
Our experiments demonstrate that RIFLE outperforms other benchmark algorithms when the percentage of missing values is high and/or when the number of data points is relatively small.
arXiv Detail & Related papers (2021-09-01T23:17:30Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.