The power of private likelihood-ratio tests for goodness-of-fit in
frequency tables
- URL: http://arxiv.org/abs/2109.09630v1
- Date: Mon, 20 Sep 2021 15:30:42 GMT
- Title: The power of private likelihood-ratio tests for goodness-of-fit in
frequency tables
- Authors: Emanuele Dolera, Stefano Favaro
- Abstract summary: We consider privacy-protecting tests for goodness-of-fit in frequency tables.
We show the importance of taking the perturbation into account to avoid a loss in the statistical significance of the test.
Our work presents the first rigorous treatment of privacy-protecting LR tests for goodness-of-fit in frequency tables.
- Score: 1.713291434132985
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy-protecting data analysis investigates statistical methods under
privacy constraints. This is a rising challenge in modern statistics, as the
achievement of confidentiality guarantees, which typically occurs through
suitable perturbations of the data, may determine a loss in the statistical
utility of the data. In this paper, we consider privacy-protecting tests for
goodness-of-fit in frequency tables, this being arguably the most common form
of releasing data. Under the popular framework of
$(\varepsilon,\delta)$-differential privacy for perturbed data, we introduce a
private likelihood-ratio (LR) test for goodness-of-fit and we study its large
sample properties, showing the importance of taking the perturbation into
account to avoid a loss in the statistical significance of the test. Our main
contribution provides a quantitative characterization of the trade-off between
confidentiality, measured via differential privacy parameters $\varepsilon$ and
$\delta$, and utility, measured via the power of the test. In particular, we
establish a precise Bahadur-Rao type large deviation expansion for the power of
the private LR test, which leads to: i) identify a critical quantity, as a
function of the sample size and $(\varepsilon,\delta)$, which determines a loss
in the power of the private LR test; ii) quantify the sample cost of
$(\varepsilon,\delta)$-differential privacy in the private LR test, namely the
additional sample size that is required to recover the power of the LR test in
the absence of perturbation. Such a result relies on a novel multidimensional
large deviation principle for sum of i.i.d. random vectors, which is of
independent interest. Our work presents the first rigorous treatment of
privacy-protecting LR tests for goodness-of-fit in frequency tables, making use
of the power of the test to quantify the trade-off between confidentiality and
utility.
Related papers
- Differentially Private Communication of Measurement Anomalies in the Smart Grid [4.021993915403885]
We present a framework based on differential privacy (DP) for querying electric power measurements to detect system anomalies or bad data.
Our DP approach conceals consumption and system matrix data, while simultaneously enabling an untrusted third party to test hypotheses of anomalies.
We propose a novel DP chi-square noise mechanism that ensures the test does not reveal private information about power injections or the system matrix.
arXiv Detail & Related papers (2024-03-04T18:55:16Z) - Conditional Density Estimations from Privacy-Protected Data [0.0]
We propose simulation-based inference methods from privacy-protected datasets.
We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models.
arXiv Detail & Related papers (2023-10-19T14:34:17Z) - A Randomized Approach for Tight Privacy Accounting [63.67296945525791]
We propose a new differential privacy paradigm called estimate-verify-release (EVR)
EVR paradigm first estimates the privacy parameter of a mechanism, then verifies whether it meets this guarantee, and finally releases the query output.
Our empirical evaluation shows the newly proposed EVR paradigm improves the utility-privacy tradeoff for privacy-preserving machine learning.
arXiv Detail & Related papers (2023-04-17T00:38:01Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis
Testing: A Lesson From Fano [83.5933307263932]
We study data reconstruction attacks for discrete data and analyze it under the framework of hypothesis testing.
We show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $epsilon$ can be $O(log M)$ before the adversary gains significant inferential power.
arXiv Detail & Related papers (2022-10-24T23:50:12Z) - On the Statistical Complexity of Estimation and Testing under Privacy Constraints [17.04261371990489]
We show how to characterize the power of a statistical test under differential privacy in a plug-and-play fashion.
We show that maintaining privacy results in a noticeable reduction in performance only when the level of privacy protection is very high.
Finally, we demonstrate that the DP-SGLD algorithm, a private convex solver, can be employed for maximum likelihood estimation with a high degree of confidence.
arXiv Detail & Related papers (2022-10-05T12:55:53Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Private Sequential Hypothesis Testing for Statisticians: Privacy, Error
Rates, and Sample Size [24.149533870085175]
We study the sequential hypothesis testing problem under a slight variant of differential privacy, known as Renyi differential privacy.
We present a new private algorithm based on Wald's Sequential Probability Ratio Test (SPRT) that also gives strong theoretical privacy guarantees.
arXiv Detail & Related papers (2022-04-10T04:15:50Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - RDP-GAN: A R\'enyi-Differential Privacy based Generative Adversarial
Network [75.81653258081435]
Generative adversarial network (GAN) has attracted increasing attention recently owing to its impressive ability to generate realistic samples with high privacy protection.
However, when GANs are applied on sensitive or private training examples, such as medical or financial records, it is still probable to divulge individuals' sensitive and private information.
We propose a R'enyi-differentially private-GAN (RDP-GAN), which achieves differential privacy (DP) in a GAN by carefully adding random noises on the value of the loss function during training.
arXiv Detail & Related papers (2020-07-04T09:51:02Z) - Minimax optimal goodness-of-fit testing for densities and multinomials
under a local differential privacy constraint [3.265773263570237]
We consider the consequences of local differential privacy constraints on goodness-of-fit testing.
We present a test that is adaptive to the smoothness parameter of the unknown density and remains minimax optimal up to a logarithmic factor.
arXiv Detail & Related papers (2020-02-11T08:41:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.