Related papers: Enabling Trade-offs in Privacy and Utility in Genomic Data Beacons and Summary Statistics

Enabling Trade-offs in Privacy and Utility in Genomic Data Beacons and Summary Statistics

URL: http://arxiv.org/abs/2302.01763v1
Date: Wed, 11 Jan 2023 19:16:13 GMT
Title: Enabling Trade-offs in Privacy and Utility in Genomic Data Beacons and Summary Statistics
Authors: Rajagopal Venkatesaramani, Zhiyu Wan, Bradley A. Malin, Yevgeniy Vorobeychik
Abstract summary: We introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy. In the first, an attacker applies a likelihood-ratio test to make membership-inference claims. In the second, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals.
Score: 26.99521354120141
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query responses to the presence/absence of alleles of interest using web-services called Beacons. However, even such limited releases are susceptible to likelihood-ratio-based membership-inference attacks. Several approaches have been proposed to preserve privacy, which either suppress a subset of genomic variants or modify query responses for specific variants (e.g., adding noise, as in differential privacy). However, many of these approaches result in a significant utility loss, either suppressing many variants or adding a substantial amount of noise. In this paper, we introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy with respect to membership-inference attacks based on likelihood-ratios, combining variant suppression and modification. We consider two attack models. In the first, an attacker applies a likelihood-ratio test to make membership-inference claims. In the second model, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals in the dataset and those who are not. We further introduce highly scalable approaches for approximately solving the privacy-utility tradeoff problem when information is either in the form of summary statistics or presence/absence queries. Finally, we show that the proposed approaches outperform the state of the art in both utility and privacy through an extensive evaluation with public datasets.

Related papers

Improving Statistical Privacy by Subsampling [0.0]
A privacy mechanism often used is to take samples of the data for answering a query.<n>This paper proves precise bounds how much different methods of sampling increase privacy in the statistical setting.<n>For the DP setting tradeoff functions have been proposed as a finer measure for privacy compared to (epsilon,delta)-pairs.
arXiv Detail & Related papers (2025-04-15T17:40:45Z)
Enforcing Demographic Coherence: A Harms Aware Framework for Reasoning about Private Data Release [14.939460540040459]
We introduce demographic coherence, a condition inspired by privacy attacks that we argue is necessary for data privacy. Our framework focuses on confidence rated predictors, which can in turn be distilled from almost any data-informed process. We prove that every differentially private data release is also demographically coherent, and that there are demographically coherent algorithms which are not differentially private.
arXiv Detail & Related papers (2025-02-04T20:42:30Z)
Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner. Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z)
Bayes-Nash Generative Privacy Protection Against Membership Inference Attacks [24.330984323956173]
We propose a game model for privacy-preserving publishing of data-sharing mechanism outputs. We introduce the notions of Bayes-Nash generative privacy (BNGP) and Bayes generative privacy (BGP) risk. We apply our method to sharing summary statistics, where MIAs can re-identify individuals even from aggregated data.
arXiv Detail & Related papers (2024-10-09T20:29:04Z)
A Game-Theoretic Approach to Privacy-Utility Tradeoff in Sharing Genomic Summary Statistics [24.330984323956173]
We propose a game-theoretic framework for optimal privacy-utility tradeoffs in the sharing of genomic summary statistics. Our experiments demonstrate that the proposed framework yields both stronger attacks and stronger defense strategies than the state of the art.
arXiv Detail & Related papers (2024-06-03T22:09:47Z)
Beyond Random Noise: Insights on Anonymization Strategies from a Latent Bandit Study [44.94720642208655]
This paper investigates the issue of privacy in a learning scenario where users share knowledge for a recommendation task. We use the latent bandit setting to evaluate the trade-off between privacy and recommender performance.
arXiv Detail & Related papers (2023-09-30T01:56:04Z)
Evaluating the Impact of Local Differential Privacy on Utility Loss via Influence Functions [11.504012974208466]
We demonstrate the ability of influence functions to offer insight into how a specific privacy parameter value will affect a model's test loss. Our proposed method allows a data curator to select the privacy parameter best aligned with their allowed privacy-utility trade-off.
arXiv Detail & Related papers (2023-09-15T18:08:24Z)
Causal Inference with Differentially Private (Clustered) Outcomes [16.166525280886578]
Estimating causal effects from randomized experiments is only feasible if participants agree to reveal their responses. We suggest a new differential privacy mechanism, Cluster-DP, which leverages any given cluster structure. We show that, depending on an intuitive measure of cluster quality, we can improve the variance loss while maintaining our privacy guarantees.
arXiv Detail & Related papers (2023-08-02T05:51:57Z)
Client-specific Property Inference against Secure Aggregation in Federated Learning [52.8564467292226]
Federated learning has become a widely used paradigm for collaboratively training a common model among different participants. Many attacks have shown that it is still possible to infer sensitive information such as membership, property, or outright reconstruction of participant data. We show that simple linear models can effectively capture client-specific properties only from the aggregated model updates.
arXiv Detail & Related papers (2023-03-07T14:11:01Z)
Membership Inference Attacks against Synthetic Data through Overfitting Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution. We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z)
Breaking the Communication-Privacy-Accuracy Tradeoff with $f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability. We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP) More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z)
Post-processing of Differentially Private Data: A Fairness Perspective [53.29035917495491]
This paper shows that post-processing causes disparate impacts on individuals or groups. It analyzes two critical settings: the release of differentially private datasets and the use of such private datasets for downstream decisions. It proposes a novel post-processing mechanism that is (approximately) optimal under different fairness metrics.
arXiv Detail & Related papers (2022-01-24T02:45:03Z)
Decision Making with Differential Privacy under a Fairness Lens [65.16089054531395]
The U.S. Census Bureau releases data sets and statistics about groups of individuals that are used as input to a number of critical decision processes. To conform to privacy and confidentiality requirements, these agencies are often required to release privacy-preserving versions of the data. This paper studies the release of differentially private data sets and analyzes their impact on some critical resource allocation tasks under a fairness perspective.
arXiv Detail & Related papers (2021-05-16T21:04:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.