Related papers: PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps

PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps

URL: http://arxiv.org/abs/2403.09562v1
Date: Thu, 14 Mar 2024 16:54:17 GMT
Title: PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps
Authors: Ruixuan Liu, Tianhao Wang, Yang Cao, Li Xiong,
Abstract summary: PreCurious aims to escalate the general privacy risk of both membership inference and data extraction. PreCurious demonstrates the possibility of breaking up invulnerability in a stealthy manner compared to fine-tuning on a benign model.
Score: 13.547526990125775
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The pre-training and fine-tuning paradigm has demonstrated its effectiveness and has become the standard approach for tailoring language models to various tasks. Currently, community-based platforms offer easy access to various pre-trained models, as anyone can publish without strict validation processes. However, a released pre-trained model can be a privacy trap for fine-tuning datasets if it is carefully designed. In this work, we propose PreCurious framework to reveal the new attack surface where the attacker releases the pre-trained model and gets a black-box access to the final fine-tuned model. PreCurious aims to escalate the general privacy risk of both membership inference and data extraction. The key intuition behind PreCurious is to manipulate the memorization stage of the pre-trained model and guide fine-tuning with a seemingly legitimate configuration. The effectiveness of defending against privacy attacks on a fine-tuned model seems promising, as empirical and theoretical evidence suggests that parameter-efficient and differentially private fine-tuning techniques are invulnerable to privacy attacks. But PreCurious demonstrates the possibility of breaking up invulnerability in a stealthy manner compared to fine-tuning on a benign model. By further leveraging a sanitized dataset, PreCurious can extract originally unexposed secrets under differentially private fine-tuning. Thus, PreCurious raises warnings for users who download pre-trained models from unknown sources, rely solely on tutorials or common-sense defenses, and previously release sanitized datasets even after perfect scrubbing.

Related papers

Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner. Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z)
Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage [12.892449128678516]
Fine-tuning language models on private data for downstream applications poses significant privacy risks. Several popular community platforms now offer convenient distribution of a large variety of pre-trained models. We introduce a novel poisoning technique that uses model-unlearning as an attack tool.
arXiv Detail & Related papers (2024-08-30T15:35:09Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Privacy Backdoors: Stealing Data with Corrupted Pretrained Models [23.54726973460633]
Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications. We show that this practice introduces a new risk of privacy backdoors. We show how to build privacy backdoors for a variety of models, including transformers.
arXiv Detail & Related papers (2024-03-30T20:43:53Z)
Which Pretrain Samples to Rehearse when Finetuning Pretrained Models? [60.59376487151964]
Fine-tuning pretrained models on specific tasks is now the de facto approach for text and vision tasks. A known pitfall of this approach is the forgetting of pretraining knowledge that happens during finetuning. We propose a novel sampling scheme, mix-cd, that identifies and prioritizes samples that actually face forgetting.
arXiv Detail & Related papers (2024-02-12T22:32:12Z)
CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning [77.27443885999404]
Federated Learning (FL) is a setting for training machine learning models in distributed environments. We propose a novel method, CANIFE, that uses carefully crafted samples by a strong adversary to evaluate the empirical privacy of a training round.
arXiv Detail & Related papers (2022-10-06T13:30:16Z)
Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage. We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z)
Private Prediction Sets [72.75711776601973]
Machine learning systems need reliable uncertainty quantification and protection of individuals' privacy. We present a framework that treats these two desiderata jointly. We evaluate the method on large-scale computer vision datasets.
arXiv Detail & Related papers (2021-02-11T18:59:11Z)
Weight Poisoning Attacks on Pre-trained Models [103.19413805873585]
We show that it is possible to construct weight poisoning'' attacks where pre-trained weights are injected with vulnerabilities that expose backdoors'' after fine-tuning. Our experiments on sentiment classification, toxicity detection, and spam detection show that this attack is widely applicable and poses a serious threat.
arXiv Detail & Related papers (2020-04-14T16:51:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.