Private independence testing across two parties
- URL: http://arxiv.org/abs/2207.03652v2
- Date: Tue, 26 Sep 2023 23:46:02 GMT
- Title: Private independence testing across two parties
- Authors: Praneeth Vepakomma, Mohammad Mohammadi Amiri, Cl\'ement L. Canonne,
Ramesh Raskar, Alex Pentland
- Abstract summary: $pi$-test is a privacy-preserving algorithm for testing statistical independence between data distributed across multiple parties.
We establish both additive and multiplicative error bounds on the utility of our differentially private test.
- Score: 21.236868468146348
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce $\pi$-test, a privacy-preserving algorithm for testing
statistical independence between data distributed across multiple parties. Our
algorithm relies on privately estimating the distance correlation between
datasets, a quantitative measure of independence introduced in Sz\'ekely et al.
[2007]. We establish both additive and multiplicative error bounds on the
utility of our differentially private test, which we believe will find
applications in a variety of distributed hypothesis testing settings involving
sensitive data.
Related papers
- Federated Experiment Design under Distributed Differential Privacy [31.06808163362162]
We focus on the rigorous protection of users' privacy while minimizing the trust toward service providers.
Although a vital component in modern A/B testing, private distributed experimentation has not previously been studied.
We show how these mechanisms can be scaled up to handle the very large number of participants commonly found in practice.
arXiv Detail & Related papers (2023-11-07T22:38:56Z) - Sequential Predictive Two-Sample and Independence Testing [114.4130718687858]
We study the problems of sequential nonparametric two-sample and independence testing.
We build upon the principle of (nonparametric) testing by betting.
arXiv Detail & Related papers (2023-04-29T01:30:33Z) - On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis.
We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy.
The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z) - Differentially Private Confidence Intervals for Proportions under Stratified Random Sampling [14.066813980992132]
With the increase of data privacy awareness, developing a private version of confidence intervals has gained growing attention.
Recent work has been done around differentially private confidence intervals, yet rigorous methodologies on differentially private confidence intervals have not been studied.
We propose three differentially private algorithms for constructing confidence intervals for proportions under stratified random sampling.
arXiv Detail & Related papers (2023-01-19T21:25:41Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z) - Post-processing of Differentially Private Data: A Fairness Perspective [53.29035917495491]
This paper shows that post-processing causes disparate impacts on individuals or groups.
It analyzes two critical settings: the release of differentially private datasets and the use of such private datasets for downstream decisions.
It proposes a novel post-processing mechanism that is (approximately) optimal under different fairness metrics.
arXiv Detail & Related papers (2022-01-24T02:45:03Z) - Data-Driven Representations for Testing Independence: Modeling, Analysis
and Connection with Mutual Information Estimation [3.9023554886892433]
This work addresses testing the independence of two continuous and finite-dimensional random variables from the design of a data-driven partition.
It is shown that approximating the sufficient statistics of an oracle test offers a learning criterion for designing a data-driven partition.
Some experimental analyses provide evidence regarding our scheme's advantage for testing independence compared with some strategies that do not use data-driven representations.
arXiv Detail & Related papers (2021-10-27T02:06:05Z) - Private measurement of nonlinear correlations between data hosted across
multiple parties [14.93584434176082]
We introduce a differentially private method to measure nonlinear correlations between sensitive data hosted across two entities.
This work has direct applications to private feature screening, private independence testing, private k-sample tests, private multi-party causal inference and private data synthesis.
arXiv Detail & Related papers (2021-10-19T00:31:26Z) - Two-Sample Testing on Ranked Preference Data and the Role of Modeling
Assumptions [57.77347280992548]
In this paper, we design two-sample tests for pairwise comparison data and ranking data.
Our test requires essentially no assumptions on the distributions.
By applying our two-sample test on real-world pairwise comparison data, we conclude that ratings and rankings provided by people are indeed distributed differently.
arXiv Detail & Related papers (2020-06-21T20:51:09Z) - Automated extraction of mutual independence patterns using Bayesian
comparison of partition models [7.6146285961466]
Mutual independence is a key concept in statistics that characterizes the structural relationships between variables.
Existing methods to investigate mutual independence rely on the definition of two competing models.
We propose a general Markov chain Monte Carlo (MCMC) algorithm to numerically approximate the posterior distribution on the space of all patterns of mutual independence.
arXiv Detail & Related papers (2020-01-15T16:21:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.