Related papers: How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning

How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning

URL: http://arxiv.org/abs/2501.17501v2
Date: Wed, 05 Feb 2025 07:35:55 GMT
Title: How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning
Authors: Fabio Salerno, Ali Al-Kaswan, Maliheh Izadi,
Abstract summary: We attack both pre-trained and fine-tuned code language models to investigate the extent of data extractability.<n>Fine-tuning requires fewer resources and is increasingly used by both small and large entities for its effectiveness on specialized data.<n>Data carriers and licensing information are the most likely data to be memorized from pre-trained and fine-tuned models, while the latter is the most likely to be forgotten after fine-tuning.
Score: 2.3759432635713895
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code language models, while widely popular, are often trained on unsanitized source code gathered from across the Internet. Previous work revealed that pre-trained models can remember the content of their training data and regurgitate them through data extraction attacks. Due to the large size of current models, only a few entities have the resources for pre-training such models. However, fine-tuning requires fewer resources and is increasingly used by both small and large entities for its effectiveness on specialized data. Such small curated data for fine-tuning might contain sensitive information or proprietary assets. In this study, we attack both pre-trained and fine-tuned code language models to investigate the extent of data extractability. We first develop a custom benchmark to assess the vulnerability of both pre-training and fine-tuning samples to extraction attacks. Our findings reveal that 54.9% of extractable pre-training data could be retrieved from StarCoder2-15B, whereas this number decreased to 23.5% after fine-tuning. This indicates that fine-tuning reduces the extractability of pre-training data. However, compared to larger models, fine-tuning smaller models increases their vulnerability to data extraction attacks on fine-tuning data. Given the potential sensitivity of fine-tuning data, this can lead to more severe consequences. Lastly, we also manually analyzed 2000 extractable samples before and after fine-tuning. We also found that data carriers and licensing information are the most likely data categories to be memorized from pre-trained and fine-tuned models, while the latter is the most likely to be forgotten after fine-tuning.

Related papers

Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection [37.65064631532493]
Finetuning a pretrained model to perform unsupervised prediction on data from a target domain presents two challenges. We measure the efficiency of injecting pretraining data into the finetuning data mixture to avoid forgetting and mitigate overfitting. A key practical takeaway from our study is that injecting as little as 1% of pretraining data in the finetuning data mixture prevents the model from forgetting the pretraining set.
arXiv Detail & Related papers (2025-02-09T21:44:27Z)
ARMOR: Shielding Unlearnable Examples against Data Augmentation [25.289775916629505]
We propose a framework, dubbed ARMOR, to protect data privacy from potential breaches of data augmentation.<n> ARMOR reduces the test accuracy of the model trained on augmented protected samples by as much as 60% more than baselines.
arXiv Detail & Related papers (2025-01-15T15:22:57Z)
Sequence-Level Leakage Risk of Training Data in Large Language Models [7.600279942640982]
This work quantifies the risk of training data leakage from Large Language Models using sequence-level probabilities. We re-analyze the effects of decoding schemes, model sizes, prefix lengths, partial sequence leakages, and token positions.
arXiv Detail & Related papers (2024-12-15T20:27:45Z)
Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage [12.892449128678516]
Fine-tuning language models on private data for downstream applications poses significant privacy risks. Several popular community platforms now offer convenient distribution of a large variety of pre-trained models. We introduce a novel poisoning technique that uses model-unlearning as an attack tool.
arXiv Detail & Related papers (2024-08-30T15:35:09Z)
Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning [28.35038726318893]
Machine learning models trained on vast amounts of real or synthetic data often achieve outstanding predictive performance across various domains. To address privacy concerns, machine unlearning has been proposed to erase specific data samples from models. We introduce the Unlearning Usability Attack to distill data distribution information into a small set of benign data.
arXiv Detail & Related papers (2024-07-06T15:42:28Z)
Ask Your Distribution Shift if Pre-Training is Right for You [67.90850628695563]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others.<n>We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data.<n>Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z)
Scalable Extraction of Training Data from (Production) Language Models [93.7746567808049]
This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT.
arXiv Detail & Related papers (2023-11-28T18:47:03Z)
Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation [56.57532238195446]
We propose a method named Ethicist for targeted training data extraction. To elicit memorization, we tune soft prompt embeddings while keeping the model fixed. We show that Ethicist significantly improves the extraction performance on a recently proposed public benchmark.
arXiv Detail & Related papers (2023-07-10T08:03:41Z)
Overwriting Pretrained Bias with Finetuning Data [36.050345384273655]
We investigate bias when conceptualized as both spurious correlations between the target task and a sensitive attribute as well as underrepresentation of a particular group in the dataset. We find that models finetuned on top of pretrained models can indeed inherit their biases, but (2) this bias can be corrected for through relatively minor interventions to the finetuning dataset. Our findings imply that careful curation of the finetuning dataset is important for reducing biases on a downstream task, and doing so can even compensate for bias in the pretrained model.
arXiv Detail & Related papers (2023-03-10T19:10:58Z)
Learning to Unlearn: Instance-wise Unlearning for Pre-trained Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model. We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z)
Improved Fine-tuning by Leveraging Pre-training Data: Theory and Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications. Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy. We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z)
SSSE: Efficiently Erasing Samples from Trained Machine Learning Models [103.43466657962242]
We propose an efficient and effective algorithm, SSSE, for samples erasure. In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
arXiv Detail & Related papers (2021-07-08T14:17:24Z)
Adversarial Vulnerability of Active Transfer Learning [0.0]
Two widely used techniques for training supervised machine learning models on small datasets are Active Learning and Transfer Learning. We show that the combination of these techniques is particularly susceptible to a new kind of data poisoning attack. We show that a model trained on such a poisoned dataset has a significantly deteriorated performance, dropping from 86% to 34% test accuracy.
arXiv Detail & Related papers (2021-01-26T14:07:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.