Enabling Trade-offs in Privacy and Utility in Genomic Data Beacons and
Summary Statistics
- URL: http://arxiv.org/abs/2302.01763v1
- Date: Wed, 11 Jan 2023 19:16:13 GMT
- Title: Enabling Trade-offs in Privacy and Utility in Genomic Data Beacons and
Summary Statistics
- Authors: Rajagopal Venkatesaramani, Zhiyu Wan, Bradley A. Malin, Yevgeniy
Vorobeychik
- Abstract summary: We introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy.
In the first, an attacker applies a likelihood-ratio test to make membership-inference claims.
In the second, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals.
- Score: 26.99521354120141
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The collection and sharing of genomic data are becoming increasingly
commonplace in research, clinical, and direct-to-consumer settings. The
computational protocols typically adopted to protect individual privacy include
sharing summary statistics, such as allele frequencies, or limiting query
responses to the presence/absence of alleles of interest using web-services
called Beacons. However, even such limited releases are susceptible to
likelihood-ratio-based membership-inference attacks. Several approaches have
been proposed to preserve privacy, which either suppress a subset of genomic
variants or modify query responses for specific variants (e.g., adding noise,
as in differential privacy). However, many of these approaches result in a
significant utility loss, either suppressing many variants or adding a
substantial amount of noise. In this paper, we introduce optimization-based
approaches to explicitly trade off the utility of summary data or Beacon
responses and privacy with respect to membership-inference attacks based on
likelihood-ratios, combining variant suppression and modification. We consider
two attack models. In the first, an attacker applies a likelihood-ratio test to
make membership-inference claims. In the second model, an attacker uses a
threshold that accounts for the effect of the data release on the separation in
scores between individuals in the dataset and those who are not. We further
introduce highly scalable approaches for approximately solving the
privacy-utility tradeoff problem when information is either in the form of
summary statistics or presence/absence queries. Finally, we show that the
proposed approaches outperform the state of the art in both utility and privacy
through an extensive evaluation with public datasets.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Bayes-Nash Generative Privacy Protection Against Membership Inference Attacks [24.330984323956173]
We propose a game model for privacy-preserving publishing of data-sharing mechanism outputs.
We introduce the notions of Bayes-Nash generative privacy (BNGP) and Bayes generative privacy (BGP) risk.
We apply our method to sharing summary statistics, where MIAs can re-identify individuals even from aggregated data.
arXiv Detail & Related papers (2024-10-09T20:29:04Z) - A Game-Theoretic Approach to Privacy-Utility Tradeoff in Sharing Genomic Summary Statistics [24.330984323956173]
We propose a game-theoretic framework for optimal privacy-utility tradeoffs in the sharing of genomic summary statistics.
Our experiments demonstrate that the proposed framework yields both stronger attacks and stronger defense strategies than the state of the art.
arXiv Detail & Related papers (2024-06-03T22:09:47Z) - TernaryVote: Differentially Private, Communication Efficient, and
Byzantine Resilient Distributed Optimization on Heterogeneous Data [50.797729676285876]
We propose TernaryVote, which combines a ternary compressor and the majority vote mechanism to realize differential privacy, gradient compression, and Byzantine resilience simultaneously.
We theoretically quantify the privacy guarantee through the lens of the emerging f-differential privacy (DP) and the Byzantine resilience of the proposed algorithm.
arXiv Detail & Related papers (2024-02-16T16:41:14Z) - Beyond Random Noise: Insights on Anonymization Strategies from a Latent
Bandit Study [44.94720642208655]
This paper investigates the issue of privacy in a learning scenario where users share knowledge for a recommendation task.
We use the latent bandit setting to evaluate the trade-off between privacy and recommender performance.
arXiv Detail & Related papers (2023-09-30T01:56:04Z) - Evaluating the Impact of Local Differential Privacy on Utility Loss via
Influence Functions [11.504012974208466]
We demonstrate the ability of influence functions to offer insight into how a specific privacy parameter value will affect a model's test loss.
Our proposed method allows a data curator to select the privacy parameter best aligned with their allowed privacy-utility trade-off.
arXiv Detail & Related papers (2023-09-15T18:08:24Z) - Causal Inference with Differentially Private (Clustered) Outcomes [16.166525280886578]
Estimating causal effects from randomized experiments is only feasible if participants agree to reveal their responses.
We suggest a new differential privacy mechanism, Cluster-DP, which leverages any given cluster structure.
We show that, depending on an intuitive measure of cluster quality, we can improve the variance loss while maintaining our privacy guarantees.
arXiv Detail & Related papers (2023-08-02T05:51:57Z) - Client-specific Property Inference against Secure Aggregation in
Federated Learning [52.8564467292226]
Federated learning has become a widely used paradigm for collaboratively training a common model among different participants.
Many attacks have shown that it is still possible to infer sensitive information such as membership, property, or outright reconstruction of participant data.
We show that simple linear models can effectively capture client-specific properties only from the aggregated model updates.
arXiv Detail & Related papers (2023-03-07T14:11:01Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Breaking the Communication-Privacy-Accuracy Tradeoff with
$f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability.
We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP)
More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z) - Post-processing of Differentially Private Data: A Fairness Perspective [53.29035917495491]
This paper shows that post-processing causes disparate impacts on individuals or groups.
It analyzes two critical settings: the release of differentially private datasets and the use of such private datasets for downstream decisions.
It proposes a novel post-processing mechanism that is (approximately) optimal under different fairness metrics.
arXiv Detail & Related papers (2022-01-24T02:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.