Assessing Utility of Differential Privacy for RCTs
- URL: http://arxiv.org/abs/2309.14581v1
- Date: Tue, 26 Sep 2023 00:10:32 GMT
- Title: Assessing Utility of Differential Privacy for RCTs
- Authors: Soumya Mukherjee, Aratrika Mustafi, Aleksandra Slavković, Lars Vilhuber,
- Abstract summary: We empirically assess the impact of strong privacy-preservation methodology (with acDP guarantees) on published analyses from RCTs.
We find that relatively straightforward DP-based methods allow for inference-valid protection of the published data.
- Score: 44.15661493715815
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Randomized control trials, RCTs, have become a powerful tool for assessing the impact of interventions and policies in many contexts. They are considered the gold-standard for inference in the biomedical fields and in many social sciences. Researchers have published an increasing number of studies that rely on RCTs for at least part of the inference, and these studies typically include the response data collected, de-identified and sometimes protected through traditional disclosure limitation methods. In this paper, we empirically assess the impact of strong privacy-preservation methodology (with \ac{DP} guarantees), on published analyses from RCTs, leveraging the availability of replication packages (research compendia) in economics and policy analysis. We provide simulations studies and demonstrate how we can replicate the analysis in a published economics article on privacy-protected data under various parametrizations. We find that relatively straightforward DP-based methods allow for inference-valid protection of the published data, though computational issues may limit more complex analyses from using these methods. The results have applicability to researchers wishing to share RCT data, especially in the context of low- and middle-income countries, with strong privacy protection.
Related papers
- An applied Perspective: Estimating the Differential Identifiability Risk of an Exemplary SOEP Data Set [2.66269503676104]
We show how to compute the risk metric efficiently for a set of basic statistical queries.
Our empirical analysis based on an extensive, real-world scientific data set expands the knowledge on how to compute risks under realistic conditions.
arXiv Detail & Related papers (2024-07-04T17:50:55Z) - Privacy Impact Assessments in the Wild: A Scoping Review [1.7677916783208343]
Privacy Impact Assessments (PIAs) offer a systematic process for assessing the privacy impacts of a project or system.
PIAs are heralded as one of the main approaches to privacy by design, supporting the early identification of threats and controls.
There is still a significant need for more primary research on the topic, both qualitative and quantitative.
arXiv Detail & Related papers (2024-02-17T05:07:10Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - Evaluating COVID-19 vaccine allocation policies using Bayesian $m$-top
exploration [53.122045119395594]
We present a novel technique for evaluating vaccine allocation strategies using a multi-armed bandit framework.
$m$-top exploration allows the algorithm to learn $m$ policies for which it expects the highest utility.
We consider the Belgian COVID-19 epidemic using the individual-based model STRIDE, where we learn a set of vaccination policies.
arXiv Detail & Related papers (2023-01-30T12:22:30Z) - CEDAR: Communication Efficient Distributed Analysis for Regressions [9.50726756006467]
There are growing interests about distributed learning over multiple EHRs databases without sharing patient-level data.
We propose a novel communication efficient method that aggregates the local optimal estimates, by turning the problem into a missing data problem.
We provide theoretical investigation for the properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses.
arXiv Detail & Related papers (2022-07-01T09:53:44Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Differential privacy and robust statistics in high dimensions [49.50869296871643]
High-dimensional Propose-Test-Release (HPTR) builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism.
We show that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.
arXiv Detail & Related papers (2021-11-12T06:36:40Z) - A Survey on Causal Inference [64.45536158710014]
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics.
Various causal effect estimation methods for observational data have sprung up.
arXiv Detail & Related papers (2020-02-05T21:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.