Related papers: TMI! Finetuned Models Leak Private Information from their Pretraining Data

TMI! Finetuned Models Leak Private Information from their Pretraining Data

URL: http://arxiv.org/abs/2306.01181v3
Date: Wed, 16 Oct 2024 18:55:39 GMT
Title: TMI! Finetuned Models Leak Private Information from their Pretraining Data
Authors: John Abascal, Stanley Wu, Alina Oprea, Jonathan Ullman,
Abstract summary: We propose a new membership-inference threat model where the adversary only has access to the finetuned model. We evaluate $textbfTMI$ on both vision and natural language tasks across multiple transfer learning settings. An open-source implementation of $textbfTMI$ can be found on GitHub.
Score: 5.150344987657356
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transfer learning has become an increasingly popular technique in machine learning as a way to leverage a pretrained model trained for one task to assist with building a finetuned model for a related task. This paradigm has been especially popular for $\textit{privacy}$ in machine learning, where the pretrained model is considered public, and only the data for finetuning is considered sensitive. However, there are reasons to believe that the data used for pretraining is still sensitive, making it essential to understand how much information the finetuned model leaks about the pretraining data. In this work we propose a new membership-inference threat model where the adversary only has access to the finetuned model and would like to infer the membership of the pretraining data. To realize this threat model, we implement a novel metaclassifier-based attack, $\textbf{TMI}$, that leverages the influence of memorized pretraining samples on predictions in the downstream task. We evaluate $\textbf{TMI}$ on both vision and natural language tasks across multiple transfer learning settings, including finetuning with differential privacy. Through our evaluation, we find that $\textbf{TMI}$ can successfully infer membership of pretraining examples using query access to the finetuned model. An open-source implementation of $\textbf{TMI}$ can be found on GitHub: https://github.com/johnmath/tmi-pets24.

Related papers

Intention-Conditioned Flow Occupancy Models [69.79049994662591]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z)
Metadata Conditioning Accelerates Language Model Pre-training [76.54265482251454]
We propose a new method, termed Metadata Conditioning then Cooldown (MeCo) to incorporate additional learning cues during pre-training. MeCo significantly accelerates pre-training across different model scales (600M to 8B parameters) and training sources (C4, RefinedWeb, and DCLM) MeCo is remarkably simple, adds no computational overhead, and demonstrates promise in producing more capable and steerable language models.
arXiv Detail & Related papers (2025-01-03T18:59:23Z)
Vertical Federated Unlearning via Backdoor Certification [15.042986414487922]
VFL offers a novel paradigm in machine learning, enabling distinct entities to train models cooperatively while maintaining data privacy. Recent privacy regulations emphasize an individual's emphright to be forgotten, which necessitates the ability for models to unlearn specific training data. We introduce an innovative modification to traditional VFL by employing a mechanism that inverts the typical learning trajectory with the objective of extracting specific data contributions.
arXiv Detail & Related papers (2024-12-16T06:40:25Z)
Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage [12.892449128678516]
Fine-tuning language models on private data for downstream applications poses significant privacy risks. Several popular community platforms now offer convenient distribution of a large variety of pre-trained models. We introduce a novel poisoning technique that uses model-unlearning as an attack tool.
arXiv Detail & Related papers (2024-08-30T15:35:09Z)
Model Inversion Robustness: Can Transfer Learning Help? [27.883074562565877]
Model Inversion (MI) attacks aim to reconstruct private training data by abusing access to machine learning models. We propose Transfer Learning-based Defense against Model Inversion (TL-DMI) to render MI-robust models. Our method achieves state-of-the-art (SOTA) MI robustness without bells and whistles.
arXiv Detail & Related papers (2024-05-09T07:24:28Z)
LMEraser: Large Model Unlearning through Adaptive Prompt Tuning [21.141664917477257]
LMEraser takes a divide-and-conquer strategy with a prompt tuning architecture to isolate data influence. Experiments demonstrate that LMEraser achieves a $100$-fold reduction in unlearning costs without compromising accuracy.
arXiv Detail & Related papers (2024-04-17T04:08:38Z)
Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models [4.081098869497239]
We develop state-of-the-art privacy attacks against Large Language Models (LLMs) New membership inference attacks (MIAs) against pretrained LLMs perform hundreds of times better than baseline attacks. In fine-tuning, we find that a simple attack based on the ratio of the loss between the base and fine-tuned models is able to achieve near-perfect MIA performance.
arXiv Detail & Related papers (2024-02-26T20:41:50Z)
Don't Memorize; Mimic The Past: Federated Class Incremental Learning Without Episodic Memory [36.4406505365313]
This paper presents a framework for federated class incremental learning that utilizes a generative model to synthesize samples from past distributions instead of storing part of past data. The generative model is trained on the server using data-free methods at the end of each task without requesting data from clients.
arXiv Detail & Related papers (2023-07-02T07:06:45Z)
Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data. Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z)
CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning [77.27443885999404]
Federated Learning (FL) is a setting for training machine learning models in distributed environments. We propose a novel method, CANIFE, that uses carefully crafted samples by a strong adversary to evaluate the empirical privacy of a training round.
arXiv Detail & Related papers (2022-10-06T13:30:16Z)
Datamodels: Predicting Predictions from Training Data [86.66720175866415]
We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data. We show that even simple linear datamodels can successfully predict model outputs.
arXiv Detail & Related papers (2022-02-01T18:15:24Z)
bert2BERT: Towards Reusable Pretrained Language Models [51.078081486422896]
We propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model. bert2BERT saves about 45% and 47% computational cost of pre-training BERT_BASE and GPT_BASE by reusing the models of almost their half sizes.
arXiv Detail & Related papers (2021-10-14T04:05:25Z)
LogME: Practical Assessment of Pre-trained Models for Transfer Learning [80.24059713295165]
The Logarithm of Maximum Evidence (LogME) can be used to assess pre-trained models for transfer learning. Compared to brute-force fine-tuning, LogME brings over $3000times$ speedup in wall-clock time.
arXiv Detail & Related papers (2021-02-22T13:58:11Z)
Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters. We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data. Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.