Related papers: Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships

Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships

URL: http://arxiv.org/abs/2402.12189v2
Date: Sun, 1 Sep 2024 03:02:36 GMT
Title: Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships
Authors: Myung Gyo Oh, Hong Eun Ahn, Leo Hyun Park, Taekyoung Kwon,
Abstract summary: Neural language models (LMs) are vulnerable to training data extraction attacks due to data memorization. This paper introduces a novel attack scenario wherein an attacker fine-tunes pre-trained LMs to amplify the exposure of the original training data. Our empirical findings indicate a remarkable outcome: LMs with over 1B parameters exhibit a four to eight-fold increase in training data exposure.
Score: 3.544065185401289
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural language models (LMs) are vulnerable to training data extraction attacks due to data memorization. This paper introduces a novel attack scenario wherein an attacker adversarially fine-tunes pre-trained LMs to amplify the exposure of the original training data. This strategy differs from prior studies by aiming to intensify the LM's retention of its pre-training dataset. To achieve this, the attacker needs to collect generated texts that are closely aligned with the pre-training data. However, without knowledge of the actual dataset, quantifying the amount of pre-training data within generated texts is challenging. To address this, we propose the use of pseudo-labels for these generated texts, leveraging membership approximations indicated by machine-generated probabilities from the target LM. We subsequently fine-tune the LM to favor generations with higher likelihoods of originating from the pre-training data, based on their membership probabilities. Our empirical findings indicate a remarkable outcome: LMs with over 1B parameters exhibit a four to eight-fold increase in training data exposure. We discuss potential mitigations and suggest future research directions.

Related papers

Learning to Detect Language Model Training Data via Active Reconstruction [65.4791582049743]
We introduce textbfActive Data Reconstruction Attack (ADRA)<n>ADRA induces a model to reconstruct a given text through training.<n>Our algorithms consistently outperform existing MIAs in detecting pre-training, post-training, and distillation data.
arXiv Detail & Related papers (2026-02-22T03:20:06Z)
On the Effectiveness of Membership Inference in Targeted Data Extraction from Large Language Models [3.1988753364712115]
Large Language Models (LLMs) are prone to mem- orizing training data, which poses serious privacy risks.<n>In this study, we integrate multiple MIA techniques into the data extraction pipeline to systematically benchmark their effectiveness.
arXiv Detail & Related papers (2025-12-15T14:05:49Z)
Retracing the Past: LLMs Emit Training Data When They Get Lost [18.852558767604823]
memorization of training data in large language models poses significant privacy and copyright concerns.<n>This paper introduces Confusion-Inducing Attacks (CIA), a principled framework for extracting memorized data.
arXiv Detail & Related papers (2025-10-27T03:48:24Z)
Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning [11.722958734691021]
We show that indirect data poisoning can effectively protect a dataset and trace its use.<n>We make a model learn arbitrary secret sequences: secret responses to secret prompts that are absent from the training corpus.<n>We validate our approach on language models pre-trained from scratch and show that less than 0.005% of poisoned tokens are sufficient to covertly make a LM learn a secret.
arXiv Detail & Related papers (2025-06-17T18:46:45Z)
Reasoning to Learn from Latent Thoughts [45.59740535714148]
We show that explicitly modeling and inferring the latent thoughts that underlie the text generation process can significantly improve pretraining data efficiency. We show that a 1B LM can bootstrap its performance across at least three iterations and significantly outperform baselines trained on raw data. The gains from inference scaling and EM iterations suggest new opportunities for scaling data-constrained pretraining.
arXiv Detail & Related papers (2025-03-24T16:41:23Z)
Stealing Training Data from Large Language Models in Decentralized Training through Activation Inversion Attack [53.823990570014494]
Decentralized training has become a resource-efficient framework to democratize the training of large language models (LLMs) This paper identifies a novel and realistic attack surface: the privacy leakage from training data in decentralized training.
arXiv Detail & Related papers (2025-02-22T05:19:20Z)
Towards a Theoretical Understanding of Memorization in Diffusion Models [76.85077961718875]
Diffusion probabilistic models (DPMs) are being employed as mainstream models for Generative Artificial Intelligence (GenAI) We provide a theoretical understanding of memorization in both conditional and unconditional DPMs under the assumption of model convergence. We propose a novel data extraction method named textbfSurrogate condItional Data Extraction (SIDE) that leverages a time-dependent classifier trained on the generated data as a surrogate condition to extract training data from unconditional DPMs.
arXiv Detail & Related papers (2024-10-03T13:17:06Z)
Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens [1.2549198550400134]
Large language models (LLMs) are extensively used, but there are concerns regarding privacy, security, and copyright due to their opaque training data. Current solutions to this problem leverage techniques explored in machine learning privacy such as Membership Inference Attacks (MIAs) We propose an adaptive pre-training data detection method which alleviates this reliance and effectively amplify the identification.
arXiv Detail & Related papers (2024-07-30T23:43:59Z)
ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods [56.073335779595475]
We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) ReCaLL examines the relative change in conditional log-likelihoods when prefixing target data points with non-member context. We conduct comprehensive experiments and show that ReCaLL achieves state-of-the-art performance on the WikiMIA dataset.
arXiv Detail & Related papers (2024-06-23T00:23:13Z)
Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models [37.172662930947446]
Language models (LMs) are potentially vulnerable to extraction attacks, which represent a significant privacy risk. We propose Privacy Protection via Optimal Parameters (POP), a novel unlearning method that effectively forgets the target token sequences from the pretrained LM. POP exhibits remarkable retention performance post-unlearning across 9 classification and 4 dialogue benchmarks, outperforming the state-of-the-art by a large margin.
arXiv Detail & Related papers (2024-06-20T08:12:49Z)
Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI) We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization. Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z)
Probing Language Models for Pre-training Data Detection [11.37731401086372]
We propose to utilize the probing technique for pre-training data detection by examining the model's internal activations. Our method is simple and effective and leads to more trustworthy pre-training data detection.
arXiv Detail & Related papers (2024-06-03T13:58:04Z)
Safe Training with Sensitive In-domain Data: Leveraging Data Fragmentation To Mitigate Linkage Attacks [2.8186733524862158]
Current text generation models are trained using real data which can potentially contain sensitive information. We propose a safer alternative which sees fragmented data in the form of domain-specific short phrases randomly grouped together.
arXiv Detail & Related papers (2024-04-30T12:09:55Z)
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent. We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements. Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z)
Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value. As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z)
On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance. We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.