Tukey Depth Mechanisms for Practical Private Mean Estimation
- URL: http://arxiv.org/abs/2502.18698v1
- Date: Tue, 25 Feb 2025 23:21:54 GMT
- Title: Tukey Depth Mechanisms for Practical Private Mean Estimation
- Authors: Gavin Brown, Lydia Zakynthinou,
- Abstract summary: Mean estimation is a fundamental task in statistics and a focus within differentially private statistical estimation.<n>In this work, we take the first step to bridge the gap by implementing the (Restricted) Tukey Depth Mechanism.<n>Our implementations enable the use of these mechanisms for small sample sizes or low-dimensional data.
- Score: 5.298789668079187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mean estimation is a fundamental task in statistics and a focus within differentially private statistical estimation. While univariate methods based on the Gaussian mechanism are widely used in practice, more advanced techniques such as the exponential mechanism over quantiles offer robustness and improved performance, especially for small sample sizes. Tukey depth mechanisms carry these advantages to multivariate data, providing similar strong theoretical guarantees. However, practical implementations fall behind these theoretical developments. In this work, we take the first step to bridge this gap by implementing the (Restricted) Tukey Depth Mechanism, a theoretically optimal mean estimator for multivariate Gaussian distributions, yielding improved practical methods for private mean estimation. Our implementations enable the use of these mechanisms for small sample sizes or low-dimensional data. Additionally, we implement variants of these mechanisms that use approximate versions of Tukey depth, trading off accuracy for faster computation. We demonstrate their efficiency in practice, showing that they are viable options for modest dimensions. Given their strong accuracy and robustness guarantees, we contend that they are competitive approaches for mean estimation in this regime. We explore future directions for improving the computational efficiency of these algorithms by leveraging fast polytope volume approximation techniques, paving the way for more accurate private mean estimation in higher dimensions.
Related papers
- Prediction-Powered Adaptive Shrinkage Estimation [0.9208007322096532]
Prediction-Powered Adaptive Shrinkage (PAS) is a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means.<n>PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.
arXiv Detail & Related papers (2025-02-20T00:24:05Z) - Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications.<n>Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility.<n>We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z) - Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines.
We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z) - Instance-Specific Asymmetric Sensitivity in Differential Privacy [2.855485723554975]
We build upon previous work that gives a paradigm for selecting an output through the exponential mechanism.
Our framework will slightly modify the closeness metric and instead give a simple and efficient application of the sparse vector technique.
arXiv Detail & Related papers (2023-11-02T05:01:45Z) - Variational Factorization Machines for Preference Elicitation in
Large-Scale Recommender Systems [17.050774091903552]
We propose a variational formulation of factorization machines (FMs) that can be easily optimized using standard mini-batch descent gradient.
Our algorithm learns an approximate posterior distribution over the user and item parameters, which leads to confidence intervals over the predictions.
We show, using several datasets, that it has comparable or better performance than existing methods in terms of prediction accuracy.
arXiv Detail & Related papers (2022-12-20T00:06:28Z) - Privacy Induces Robustness: Information-Computation Gaps and Sparse Mean
Estimation [8.9598796481325]
We investigate the consequences of this observation for both algorithms and computational complexity across different statistical problems.
We establish an information-computation gap for private sparse mean estimation.
We also give evidence for privacy-induced information-computation gaps for several other statistics and learning problems.
arXiv Detail & Related papers (2022-11-01T20:03:41Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications.
Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients.
We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z) - Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics.
Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis.
We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z) - Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression.
Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice.
A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z) - CoinPress: Practical Private Mean and Covariance Estimation [18.6419638570742]
We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data.
We show that their error rates match the state-of-the-art theoretical bounds, and that they concretely outperform all previous methods.
arXiv Detail & Related papers (2020-06-11T17:17:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.