Anonymizing Machine Learning Models
- URL: http://arxiv.org/abs/2007.13086v3
- Date: Mon, 2 Aug 2021 12:45:21 GMT
- Title: Anonymizing Machine Learning Models
- Authors: Abigail Goldsteen, Gilad Ezov, Ron Shmelkin, Micha Moffie, Ariel
Farkash
- Abstract summary: Anonymized data is exempt from obligations set out in regulations such as the EU General Data Protection Regulation.
We propose a method that is able to achieve better model accuracy by using the knowledge encoded within the trained model.
We also demonstrate that our approach has a similar, and sometimes even better ability to prevent membership attacks as approaches based on differential privacy.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a known tension between the need to analyze personal data to drive
business and privacy concerns. Many data protection regulations, including the
EU General Data Protection Regulation (GDPR) and the California Consumer
Protection Act (CCPA), set out strict restrictions and obligations on the
collection and processing of personal data. Moreover, machine learning models
themselves can be used to derive personal information, as demonstrated by
recent membership and attribute inference attacks. Anonymized data, however, is
exempt from the obligations set out in these regulations. It is therefore
desirable to be able to create models that are anonymized, thus also exempting
them from those obligations, in addition to providing better protection against
attacks.
Learning on anonymized data typically results in significant degradation in
accuracy. In this work, we propose a method that is able to achieve better
model accuracy by using the knowledge encoded within the trained model, and
guiding our anonymization process to minimize the impact on the model's
accuracy, a process we call accuracy-guided anonymization. We demonstrate that
by focusing on the model's accuracy rather than generic information loss
measures, our method outperforms state of the art k-anonymity methods in terms
of the achieved utility, in particular with high values of k and large numbers
of quasi-identifiers.
We also demonstrate that our approach has a similar, and sometimes even
better ability to prevent membership inference attacks as approaches based on
differential privacy, while averting some of their drawbacks such as
complexity, performance overhead and model-specific implementations. This makes
model-guided anonymization a legitimate substitute for such methods and a
practical approach to creating privacy-preserving models.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Differentially Private Synthetic Data Generation via
Lipschitz-Regularised Variational Autoencoders [3.7463972693041274]
It is often overlooked that generative models are prone to memorising many details of individual training records.
In this paper we explore an alternative approach for privately generating data that makes direct use of the inherentity in generative models.
arXiv Detail & Related papers (2023-04-22T07:24:56Z) - One-shot Empirical Privacy Estimation for Federated Learning [43.317478030880956]
"One-shot" approach allows efficient auditing or estimation of the privacy loss of a model during the same, single training run used to fit model parameters.
We show that our method provides provably correct estimates for the privacy loss under the Gaussian mechanism.
arXiv Detail & Related papers (2023-02-06T19:58:28Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z) - The Influence of Dropout on Membership Inference in Differentially
Private Models [0.0]
Differentially private models seek to protect the privacy of data the model is trained on.
We conduct membership inference attacks against models with and without differential privacy.
arXiv Detail & Related papers (2021-03-16T12:09:51Z) - Federated Learning in Adversarial Settings [0.8701566919381224]
Federated learning scheme provides different trade-offs between robustness, privacy, bandwidth efficiency, and model accuracy.
We show that this extension performs as efficiently as the non-private but robust scheme, even with stringent privacy requirements.
This suggests a possible fundamental trade-off between Differential Privacy and robustness.
arXiv Detail & Related papers (2020-10-15T14:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.