Privacy Backdoors: Stealing Data with Corrupted Pretrained Models
- URL: http://arxiv.org/abs/2404.00473v1
- Date: Sat, 30 Mar 2024 20:43:53 GMT
- Title: Privacy Backdoors: Stealing Data with Corrupted Pretrained Models
- Authors: Shanglun Feng, Florian Tramèr,
- Abstract summary: Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications.
We show that this practice introduces a new risk of privacy backdoors.
We show how to build privacy backdoors for a variety of models, including transformers.
- Score: 23.54726973460633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications. We show that this practice introduces a new risk of privacy backdoors. By tampering with a pretrained model's weights, an attacker can fully compromise the privacy of the finetuning data. We show how to build privacy backdoors for a variety of models, including transformers, which enable an attacker to reconstruct individual finetuning samples, with a guaranteed success! We further show that backdoored models allow for tight privacy attacks on models trained with differential privacy (DP). The common optimistic practice of training DP models with loose privacy guarantees is thus insecure if the model is not trusted. Overall, our work highlights a crucial and overlooked supply chain attack on machine learning privacy.
Related papers
- Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage [12.892449128678516]
Fine-tuning language models on private data for downstream applications poses significant privacy risks.
Several popular community platforms now offer convenient distribution of a large variety of pre-trained models.
We introduce a novel poisoning technique that uses model-unlearning as an attack tool.
arXiv Detail & Related papers (2024-08-30T15:35:09Z) - No Vandalism: Privacy-Preserving and Byzantine-Robust Federated Learning [18.1129191782913]
Federated learning allows several clients to train one machine learning model jointly without sharing private data, providing privacy protection.
Traditional federated learning is vulnerable to poisoning attacks, which can not only decrease the model performance, but also implant malicious backdoors.
In this paper, we aim to build a privacy-preserving and Byzantine-robust federated learning scheme to provide an environment with no vandalism (NoV) against attacks from malicious participants.
arXiv Detail & Related papers (2024-06-03T07:59:10Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps [13.547526990125775]
We propose PreCurious framework to reveal the new attack surface where the attacker releases the pre-trained model.
PreCurious aims to escalate the general privacy risk of both membership inference and data extraction on the fine-tuning dataset.
arXiv Detail & Related papers (2024-03-14T16:54:17Z) - Can Language Models be Instructed to Protect Personal Information? [30.187731765653428]
We introduce PrivQA -- a benchmark to assess the privacy/utility trade-off when a model is instructed to protect specific categories of personal information in a simulated scenario.
We find that adversaries can easily circumvent these protections with simple jailbreaking methods through textual and/or image inputs.
We believe PrivQA has the potential to support the development of new models with improved privacy protections, as well as the adversarial robustness of these protections.
arXiv Detail & Related papers (2023-10-03T17:30:33Z) - Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information.
For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees.
We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Defending against Reconstruction Attacks with R\'enyi Differential
Privacy [72.1188520352079]
Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model.
Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget.
We show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature.
arXiv Detail & Related papers (2022-02-15T18:09:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.