Related papers: Anonymizing Machine Learning Models

Anonymizing Machine Learning Models

URL: http://arxiv.org/abs/2007.13086v3
Date: Mon, 2 Aug 2021 12:45:21 GMT
Title: Anonymizing Machine Learning Models
Authors: Abigail Goldsteen, Gilad Ezov, Ron Shmelkin, Micha Moffie, Ariel Farkash
Abstract summary: Anonymized data is exempt from obligations set out in regulations such as the EU General Data Protection Regulation. We propose a method that is able to achieve better model accuracy by using the knowledge encoded within the trained model. We also demonstrate that our approach has a similar, and sometimes even better ability to prevent membership attacks as approaches based on differential privacy.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There is a known tension between the need to analyze personal data to drive business and privacy concerns. Many data protection regulations, including the EU General Data Protection Regulation (GDPR) and the California Consumer Protection Act (CCPA), set out strict restrictions and obligations on the collection and processing of personal data. Moreover, machine learning models themselves can be used to derive personal information, as demonstrated by recent membership and attribute inference attacks. Anonymized data, however, is exempt from the obligations set out in these regulations. It is therefore desirable to be able to create models that are anonymized, thus also exempting them from those obligations, in addition to providing better protection against attacks. Learning on anonymized data typically results in significant degradation in accuracy. In this work, we propose a method that is able to achieve better model accuracy by using the knowledge encoded within the trained model, and guiding our anonymization process to minimize the impact on the model's accuracy, a process we call accuracy-guided anonymization. We demonstrate that by focusing on the model's accuracy rather than generic information loss measures, our method outperforms state of the art k-anonymity methods in terms of the achieved utility, in particular with high values of k and large numbers of quasi-identifiers. We also demonstrate that our approach has a similar, and sometimes even better ability to prevent membership inference attacks as approaches based on differential privacy, while averting some of their drawbacks such as complexity, performance overhead and model-specific implementations. This makes model-guided anonymization a legitimate substitute for such methods and a practical approach to creating privacy-preserving models.

Related papers

Self-Refining Language Model Anonymizers via Adversarial Distillation [49.17383264812234]
Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data poses emerging privacy risks.<n>We introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization.
arXiv Detail & Related papers (2025-06-02T08:21:27Z)
Breaking the Gold Standard: Extracting Forgotten Data under Exact Unlearning in Large Language Models [26.5039481643457]
We introduce a novel data extraction attack that compromises even exact unlearning.<n>We demonstrate our attack's effectiveness on a simulated medical diagnosis dataset.
arXiv Detail & Related papers (2025-05-30T09:09:33Z)
Multi-Objective Optimization-Based Anonymization of Structured Data for Machine Learning [0.5452584641316627]
Our research identifies key limitations in existing optimization models for privacy preservation. We propose a novel multi-objective optimization model that simultaneously minimizes information loss and maximizes protection against attacks.
arXiv Detail & Related papers (2025-01-02T01:52:36Z)
Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner. Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z)
Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy. Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models. This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Differentially Private Synthetic Data Generation via Lipschitz-Regularised Variational Autoencoders [3.7463972693041274]
It is often overlooked that generative models are prone to memorising many details of individual training records. In this paper we explore an alternative approach for privately generating data that makes direct use of the inherentity in generative models.
arXiv Detail & Related papers (2023-04-22T07:24:56Z)
One-shot Empirical Privacy Estimation for Federated Learning [43.317478030880956]
"One-shot" approach allows efficient auditing or estimation of the privacy loss of a model during the same, single training run used to fit model parameters. We show that our method provides provably correct estimates for the privacy loss under the Gaussian mechanism.
arXiv Detail & Related papers (2023-02-06T19:58:28Z)
No Free Lunch in "Privacy for Free: How does Dataset Condensation Help Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny. Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z)
Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data. FL and related techniques are often described as privacy-preserving. We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z)
The Influence of Dropout on Membership Inference in Differentially Private Models [0.0]
Differentially private models seek to protect the privacy of data the model is trained on. We conduct membership inference attacks against models with and without differential privacy.
arXiv Detail & Related papers (2021-03-16T12:09:51Z)
Federated Learning in Adversarial Settings [0.8701566919381224]
Federated learning scheme provides different trade-offs between robustness, privacy, bandwidth efficiency, and model accuracy. We show that this extension performs as efficiently as the non-private but robust scheme, even with stringent privacy requirements. This suggests a possible fundamental trade-off between Differential Privacy and robustness.
arXiv Detail & Related papers (2020-10-15T14:57:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.