TMI! Finetuned Models Leak Private Information from their Pretraining Data
- URL: http://arxiv.org/abs/2306.01181v2
- Date: Thu, 21 Mar 2024 15:57:29 GMT
- Title: TMI! Finetuned Models Leak Private Information from their Pretraining Data
- Authors: John Abascal, Stanley Wu, Alina Oprea, Jonathan Ullman,
- Abstract summary: We propose a new membership-inference threat model where the adversary only has access to the finetuned model.
We evaluate $textbfTMI$ on both vision and natural language tasks across multiple transfer learning settings.
An open-source implementation of $textbfTMI$ can be found on GitHub.
- Score: 5.150344987657356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning has become an increasingly popular technique in machine learning as a way to leverage a pretrained model trained for one task to assist with building a finetuned model for a related task. This paradigm has been especially popular for $\textit{privacy}$ in machine learning, where the pretrained model is considered public, and only the data for finetuning is considered sensitive. However, there are reasons to believe that the data used for pretraining is still sensitive, making it essential to understand how much information the finetuned model leaks about the pretraining data. In this work we propose a new membership-inference threat model where the adversary only has access to the finetuned model and would like to infer the membership of the pretraining data. To realize this threat model, we implement a novel metaclassifier-based attack, $\textbf{TMI}$, that leverages the influence of memorized pretraining samples on predictions in the downstream task. We evaluate $\textbf{TMI}$ on both vision and natural language tasks across multiple transfer learning settings, including finetuning with differential privacy. Through our evaluation, we find that $\textbf{TMI}$ can successfully infer membership of pretraining examples using query access to the finetuned model. An open-source implementation of $\textbf{TMI}$ can be found $\href{https://github.com/johnmath/tmi-pets24}{\text{on GitHub}}$.
Related papers
- Model Inversion Robustness: Can Transfer Learning Help? [27.883074562565877]
Model Inversion (MI) attacks aim to reconstruct private training data by abusing access to machine learning models.
We propose Transfer Learning-based Defense against Model Inversion (TL-DMI) to render MI-robust models.
Our method achieves state-of-the-art (SOTA) MI robustness without bells and whistles.
arXiv Detail & Related papers (2024-05-09T07:24:28Z) - LMEraser: Large Model Unlearning through Adaptive Prompt Tuning [21.141664917477257]
LMEraser takes a divide-and-conquer strategy with a prompt tuning architecture to isolate data influence.
Experiments demonstrate that LMEraser achieves a $100$-fold reduction in unlearning costs without compromising accuracy.
arXiv Detail & Related papers (2024-04-17T04:08:38Z) - Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models [4.081098869497239]
We develop state-of-the-art privacy attacks against Large Language Models (LLMs)
New membership inference attacks (MIAs) against pretrained LLMs perform hundreds of times better than baseline attacks.
In fine-tuning, we find that a simple attack based on the ratio of the loss between the base and fine-tuned models is able to achieve near-perfect MIA performance.
arXiv Detail & Related papers (2024-02-26T20:41:50Z) - Scalable Extraction of Training Data from (Production) Language Models [93.7746567808049]
This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset.
We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT.
arXiv Detail & Related papers (2023-11-28T18:47:03Z) - Verifiable and Provably Secure Machine Unlearning [37.353982787321385]
Machine unlearning aims to remove points from the training dataset of a machine learning model after training.
We present the first cryptographic definition of verifiable unlearning to capture the guarantees of a machine unlearning system.
We implement the protocol for three different unlearning techniques to validate its feasibility for linear regression, logistic regression, and neural networks.
arXiv Detail & Related papers (2022-10-17T14:19:52Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Datamodels: Predicting Predictions from Training Data [86.66720175866415]
We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data.
We show that even simple linear datamodels can successfully predict model outputs.
arXiv Detail & Related papers (2022-02-01T18:15:24Z) - LogME: Practical Assessment of Pre-trained Models for Transfer Learning [80.24059713295165]
The Logarithm of Maximum Evidence (LogME) can be used to assess pre-trained models for transfer learning.
Compared to brute-force fine-tuning, LogME brings over $3000times$ speedup in wall-clock time.
arXiv Detail & Related papers (2021-02-22T13:58:11Z) - Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters.
We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data.
Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z) - On the Intrinsic Differential Privacy of Bagging [69.70602220716718]
We show that Bagging achieves significantly higher accuracies than state-of-the-art differentially private machine learning methods with the same privacy budgets.
Our experimental results demonstrate that Bagging achieves significantly higher accuracies than state-of-the-art differentially private machine learning methods with the same privacy budgets.
arXiv Detail & Related papers (2020-08-22T14:17:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.