Towards more accurate and useful data anonymity vulnerability measures
- URL: http://arxiv.org/abs/2403.06595v1
- Date: Mon, 11 Mar 2024 10:40:08 GMT
- Title: Towards more accurate and useful data anonymity vulnerability measures
- Authors: Paul Francis, David Wagner,
- Abstract summary: This paper examines a number of prominent attack papers and finds several problems, all of which lead to overstating risk.
First, some papers fail to establish a correct statistical inference baseline (or any at all), leading to incorrect measures.
Second, some papers don't use a realistic membership base rate, leading to incorrect precision measures if precision is reported.
- Score: 1.3159777131162964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The purpose of anonymizing structured data is to protect the privacy of individuals in the data while retaining the statistical properties of the data. There is a large body of work that examines anonymization vulnerabilities. Focusing on strong anonymization mechanisms, this paper examines a number of prominent attack papers and finds several problems, all of which lead to overstating risk. First, some papers fail to establish a correct statistical inference baseline (or any at all), leading to incorrect measures. Notably, the reconstruction attack from the US Census Bureau that led to a redesign of its disclosure method made this mistake. We propose the non-member framework, an improved method for how to compute a more accurate inference baseline, and give examples of its operation. Second, some papers don't use a realistic membership base rate, leading to incorrect precision measures if precision is reported. Third, some papers unnecessarily report measures in such a way that it is difficult or impossible to assess risk. Virtually the entire literature on membership inference attacks, dozens of papers, make one or both of these errors. We propose that membership inference papers report precision/recall values using a representative range of base rates.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - SEBA: Strong Evaluation of Biometric Anonymizations [3.18294468240512]
We introduce SEBA, a framework for strong evaluation of biometric anonymizations.
It combines and implements the state-of-the-art methodology in an easy-to-use and easy-to-expand software framework.
As part of this discourse, we introduce and discuss new metrics that allow for a more straightforward evaluation of the privacy-utility trade-off.
arXiv Detail & Related papers (2024-07-09T08:20:03Z) - The Impact of Differential Feature Under-reporting on Algorithmic Fairness [86.275300739926]
We present an analytically tractable model of differential feature under-reporting.
We then use to characterize the impact of this kind of data bias on algorithmic fairness.
Our results show that, in real world data settings, under-reporting typically leads to increasing disparities.
arXiv Detail & Related papers (2024-01-16T19:16:22Z) - Proximal Causal Inference With Text Data [5.796482272333648]
We propose a new causal inference method that uses two instances of pre-treatment text data, infers two proxies using two zero-shot models on the separate instances, and applies these proxies in the proximal g-formula.
We evaluate our method in synthetic and semi-synthetic settings with real-world clinical notes from MIMIC-III and open large language models for zero-shot prediction.
arXiv Detail & Related papers (2024-01-12T16:51:02Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - Learning to Unlearn: Instance-wise Unlearning for Pre-trained
Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model.
We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z) - Canary in a Coalmine: Better Membership Inference with Ensembled
Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse.
Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - One Parameter Defense -- Defending against Data Inference Attacks via
Differential Privacy [26.000487178636927]
Machine learning models are vulnerable to data inference attacks, such as membership inference and model inversion attacks.
Most existing defense methods only protect against membership inference attacks.
We propose a differentially private defense method that handles both types of attacks in a time-efficient manner.
arXiv Detail & Related papers (2022-03-13T06:06:24Z) - On Primes, Log-Loss Scores and (No) Privacy [8.679020335206753]
In this paper, we prove that this additional information enables the adversary to infer the membership of any number of datapoints with full accuracy in a single query.
Our approach obviates any attack model training or access to side knowledge with the adversary.
arXiv Detail & Related papers (2020-09-17T23:35:12Z) - Anonymizing Machine Learning Models [0.0]
Anonymized data is exempt from obligations set out in regulations such as the EU General Data Protection Regulation.
We propose a method that is able to achieve better model accuracy by using the knowledge encoded within the trained model.
We also demonstrate that our approach has a similar, and sometimes even better ability to prevent membership attacks as approaches based on differential privacy.
arXiv Detail & Related papers (2020-07-26T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.