Private Sequential Hypothesis Testing for Statisticians: Privacy, Error
Rates, and Sample Size
- URL: http://arxiv.org/abs/2204.04597v1
- Date: Sun, 10 Apr 2022 04:15:50 GMT
- Title: Private Sequential Hypothesis Testing for Statisticians: Privacy, Error
Rates, and Sample Size
- Authors: Wanrong Zhang, Yajun Mei, Rachel Cummings
- Abstract summary: We study the sequential hypothesis testing problem under a slight variant of differential privacy, known as Renyi differential privacy.
We present a new private algorithm based on Wald's Sequential Probability Ratio Test (SPRT) that also gives strong theoretical privacy guarantees.
- Score: 24.149533870085175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The sequential hypothesis testing problem is a class of statistical analyses
where the sample size is not fixed in advance. Instead, the decision-process
takes in new observations sequentially to make real-time decisions for testing
an alternative hypothesis against a null hypothesis until some stopping
criterion is satisfied. In many common applications of sequential hypothesis
testing, the data can be highly sensitive and may require privacy protection;
for example, sequential hypothesis testing is used in clinical trials, where
doctors sequentially collect data from patients and must determine when to stop
recruiting patients and whether the treatment is effective. The field of
differential privacy has been developed to offer data analysis tools with
strong privacy guarantees, and has been commonly applied to machine learning
and statistical tasks.
In this work, we study the sequential hypothesis testing problem under a
slight variant of differential privacy, known as Renyi differential privacy. We
present a new private algorithm based on Wald's Sequential Probability Ratio
Test (SPRT) that also gives strong theoretical privacy guarantees. We provide
theoretical analysis on statistical performance measured by Type I and Type II
error as well as the expected sample size. We also empirically validate our
theoretical results on several synthetic databases, showing that our algorithms
also perform well in practice. Unlike previous work in private hypothesis
testing that focused only on the classical fixed sample setting, our results in
the sequential setting allow a conclusion to be reached much earlier, and thus
saving the cost of collecting additional samples.
Related papers
- Differentially private Bayesian tests [1.3127313002783776]
We present a novel differentially private Bayesian hypotheses testing framework that arise naturally under a principled data generative mechanism.
By focusing on differentially private Bayes factors based on widely used test statistics, we circumvent the need to model the complete data generative mechanism.
arXiv Detail & Related papers (2024-01-27T21:07:11Z) - Differentially Private Permutation Tests: Applications to Kernel Methods [7.596498528060537]
differential privacy has emerged as a rigorous framework for privacy protection, gaining widespread recognition in both academic and industrial circles.
This paper aims to alleviate concerns in the context of hypothesis testing by introducing differentially private permutation tests.
The proposed framework extends classical non-private permutation tests to private settings, maintaining both finite-sample validity and differential privacy in a rigorous manner.
arXiv Detail & Related papers (2023-10-29T15:13:36Z) - Conditional Density Estimations from Privacy-Protected Data [0.0]
We propose simulation-based inference methods from privacy-protected datasets.
We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models.
arXiv Detail & Related papers (2023-10-19T14:34:17Z) - Sequential Predictive Two-Sample and Independence Testing [114.4130718687858]
We study the problems of sequential nonparametric two-sample and independence testing.
We build upon the principle of (nonparametric) testing by betting.
arXiv Detail & Related papers (2023-04-29T01:30:33Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Statistical and Computational Phase Transitions in Group Testing [73.55361918807883]
We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease.
We consider two different simple random procedures for assigning individuals tests.
arXiv Detail & Related papers (2022-06-15T16:38:50Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z) - PAPRIKA: Private Online False Discovery Rate Control [27.698099204682105]
We study False Discovery Rate (FDR) control in hypothesis testing under the constraint of differential privacy for the sample.
We provide new private algorithms based on state-of-the-art results in non-private online FDR control.
arXiv Detail & Related papers (2020-02-27T18:42:23Z) - Asymptotic Validity and Finite-Sample Properties of Approximate Randomization Tests [2.28438857884398]
Our key theoretical contribution is a non-asymptotic bound on the discrepancy between the size of an approximate randomization test and the size of the original randomization test using noiseless data.
We illustrate our theory through several examples, including tests of significance in linear regression.
arXiv Detail & Related papers (2019-08-12T16:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.