CoinPress: Practical Private Mean and Covariance Estimation
- URL: http://arxiv.org/abs/2006.06618v2
- Date: Mon, 10 Oct 2022 02:18:58 GMT
- Title: CoinPress: Practical Private Mean and Covariance Estimation
- Authors: Sourav Biswas, Yihe Dong, Gautam Kamath, Jonathan Ullman
- Abstract summary: We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data.
We show that their error rates match the state-of-the-art theoretical bounds, and that they concretely outperform all previous methods.
- Score: 18.6419638570742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present simple differentially private estimators for the mean and
covariance of multivariate sub-Gaussian data that are accurate at small sample
sizes. We demonstrate the effectiveness of our algorithms both theoretically
and empirically using synthetic and real-world datasets -- showing that their
asymptotic error rates match the state-of-the-art theoretical bounds, and that
they concretely outperform all previous methods. Specifically, previous
estimators either have weak empirical accuracy at small sample sizes, perform
poorly for multivariate data, or require the user to provide strong a priori
estimates for the parameters.
Related papers
- A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set [20.166217494056916]
We propose a principled approach to construct covariance estimators without imposing restrictive assumptions.
We show that our robust estimators are efficiently computable and consistent.
Numerical experiments based on synthetic and real data show that our robust estimators are competitive with state-of-the-art estimators.
arXiv Detail & Related papers (2024-05-30T15:01:18Z) - Optimal Differentially Private PCA and Estimation for Spiked Covariance Matrices [10.377683220196873]
Estimating a covariance matrix and its associated principal components is a fundamental problem in contemporary statistics.
We study optimal differentially private Principal Component Analysis (PCA) and covariance estimation within the spiked covariance model.
We propose computationally efficient differentially private estimators and prove their minimax optimality for sub-Gaussian distributions.
arXiv Detail & Related papers (2024-01-08T11:18:14Z) - Multi-Fidelity Covariance Estimation in the Log-Euclidean Geometry [0.0]
We introduce a multi-fidelity estimator of covariance matrices that employs the log-Euclidean geometry of the symmetric positive-definite manifold.
We develop an optimal sample allocation scheme that minimizes the mean-squared error of the estimator given a fixed budget.
Evaluations of our approach using data from physical applications demonstrate more accurate metric learning and speedups of more than one order of magnitude compared to benchmarks.
arXiv Detail & Related papers (2023-01-31T16:33:46Z) - High Dimensional Statistical Estimation under One-bit Quantization [27.718986773043643]
One-bit (binary) data are preferable in many applications because of the efficiency in signal storage, processing, transmission, and enhancement of privacy.
In this paper, we study three fundamental statistical estimation problems.
Under both sub-Gaussian and heavy-tailed regimes, new estimators that handle high-dimensional scaling are proposed.
arXiv Detail & Related papers (2022-02-26T15:13:04Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Distributionally Robust Local Non-parametric Conditional Estimation [22.423052432220235]
We propose a new distributionally robust estimator that generates non-parametric local estimates.
We show that despite being generally intractable, the local estimator can be efficiently found via convex optimization.
Experiments with synthetic and MNIST datasets show the competitive performance of this new class of estimators.
arXiv Detail & Related papers (2020-10-12T00:11:17Z) - Effective Data-aware Covariance Estimator from Compressed Data [63.16042585506435]
We propose a data-aware weighted sampling based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation.
We conduct extensive experiments on both synthetic and real-world datasets to demonstrate the superior performance of our DACE.
arXiv Detail & Related papers (2020-10-10T10:10:28Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.