Related papers: Measuring memorization in language models via probabilistic extraction

Measuring memorization in language models via probabilistic extraction

URL: http://arxiv.org/abs/2410.19482v3
Date: Thu, 20 Mar 2025 15:35:56 GMT
Title: Measuring memorization in language models via probabilistic extraction
Authors: Jamie Hayes, Marika Swanberg, Harsh Chaudhari, Itay Yona, Ilia Shumailov, Milad Nasr, Christopher A. Choquette-Choo, Katherine Lee, A. Feder Cooper,
Abstract summary: Large language models (LLMs) are susceptible to memorizing training data.<n>Discoverable extraction is the most common method for measuring this issue.<n>We introduce probabilistic discoverable extraction, which, without additional cost, relaxes discoverable extraction by considering multiple queries.
Score: 29.438509661725117
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are susceptible to memorizing training data, raising concerns about the potential extraction of sensitive information at generation time. Discoverable extraction is the most common method for measuring this issue: split a training example into a prefix and suffix, then prompt the LLM with the prefix, and deem the example extractable if the LLM generates the matching suffix using greedy sampling. This definition yields a yes-or-no determination of whether extraction was successful with respect to a single query. Though efficient to compute, we show that this definition is unreliable because it does not account for non-determinism present in more realistic (non-greedy) sampling schemes, for which LLMs produce a range of outputs for the same prompt. We introduce probabilistic discoverable extraction, which, without additional cost, relaxes discoverable extraction by considering multiple queries to quantify the probability of extracting a target sequence. We evaluate our probabilistic measure across different models, sampling schemes, and training-data repetitions, and find that this measure provides more nuanced information about extraction risk compared to traditional discoverable extraction.

Related papers

Random-Set Large Language Models [4.308457163593758]
Large Language Models (LLMs) are known to produce very high-quality tests and responses to our queries. But how much can we trust this generated text? We propose a novel Random-Set Large Language Model (RSLLM) approach which predicts finite random sets (belief functions) over the token space.
arXiv Detail & Related papers (2025-04-25T05:25:27Z)
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection. We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z)
Uncertainty-Guided Optimization on Large Language Model Search Trees [42.71167208999792]
Tree search algorithms such as greedy and beam search are the standard when it comes to finding sequences of maximum likelihood in the decoding processes of large language models (LLMs) We define prior beliefs over LLMs' transition probabilities and obtain posterior beliefs over the most promising paths in each iteration. Unlike expensive simulation-based non-myopic methods like the Monte Carlo tree search, our method only requires samples from the beliefs.
arXiv Detail & Related papers (2024-07-04T14:08:50Z)
Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI) We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization. Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z)
CSS: Contrastive Semantic Similarity for Uncertainty Quantification of LLMs [1.515687944002438]
We propose Contrastive Semantic Similarity, a module to obtain similarity features for measuring uncertainty for text pairs. We conduct extensive experiments with three large language models (LLMs) on several benchmark question-answering datasets. Results show that our proposed method performs better in estimating reliable responses of LLMs than comparable baselines.
arXiv Detail & Related papers (2024-06-05T11:35:44Z)
To Believe or Not to Believe Your LLM [51.2579827761899]
We explore uncertainty quantification in large language models (LLMs) We derive an information-theoretic metric that allows to reliably detect when only epistemic uncertainty is large. We conduct a series of experiments which demonstrate the advantage of our formulation.
arXiv Detail & Related papers (2024-06-04T17:58:18Z)
Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation [73.58618024960968]
An increasing number of studies are employing large language models (LLMs) as agents to emulate the sequential decision-making processes of humans. This arouses curiosity regarding the capacity of LLM agents to comprehend probability distributions. Our analysis indicates that LLM agents can understand probabilities, but they struggle with probability sampling.
arXiv Detail & Related papers (2024-04-13T16:59:28Z)
C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z)
Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z)
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability. In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling. Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z)
Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions. We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z)
Conformal Language Modeling [61.94417935386489]
We propose a novel approach to conformal prediction for generative language models (LMs) Standard conformal prediction produces prediction sets with rigorous, statistical guarantees. We demonstrate the promise of our approach on multiple tasks in open-domain question answering, text summarization, and radiology report generation.
arXiv Detail & Related papers (2023-06-16T21:55:08Z)
TriSig: Assessing the statistical significance of triclusters [2.064612766965483]
This work proposes a statistical frame to assess the probability of patterns in tensor data to deviate from null expectations. A comprehensive discussion on binomial testing for false positive discoveries is entailed. Results gathered from the application of state-of-the-art triclustering algorithms over distinct real-world case studies in biochemical and biotechnological domains.
arXiv Detail & Related papers (2023-06-01T13:08:54Z)
Learning from a Biased Sample [3.546358664345473]
We propose a method for learning a decision rule that minimizes the worst-case risk incurred under a family of test distributions. We empirically validate our proposed method in a case study on prediction of mental health scores from health survey data.
arXiv Detail & Related papers (2022-09-05T04:19:16Z)
Robust Out-of-Distribution Detection on Deep Probabilistic Generative Models [0.06372261626436676]
Out-of-distribution (OOD) detection is an important task in machine learning systems. Deep probabilistic generative models facilitate OOD detection by estimating the likelihood of a data sample. We propose a new detection metric that operates without outlier exposure.
arXiv Detail & Related papers (2021-06-15T06:36:10Z)
SparseIDS: Learning Packet Sampling with Reinforcement Learning [1.978587235008588]
Recurrent Neural Networks (RNNs) have been shown to be valuable for constructing Intrusion Detection Systems (IDSs) for network data. We show that by using a novel Reinforcement Learning (RL)-based approach called SparseIDS, we can reduce the number of consumed packets by more than three fourths.
arXiv Detail & Related papers (2020-02-10T15:38:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.