Related papers: Memorization: A Close Look at Books

Memorization: A Close Look at Books

URL: http://arxiv.org/abs/2504.12549v1
Date: Thu, 17 Apr 2025 00:20:18 GMT
Title: Memorization: A Close Look at Books
Authors: Iris Ma, Ian Domingo, Alberto Krone-Martins, Pierre Baldi, Cristina V. Lopes,
Abstract summary: Using the Llama 3 70B family of models, we were able to auto-regressively reconstruct one entire book from just the first 500 tokens.<n>We show that extraction rates of books correlate with book popularity and thus, likely duplication in the training data.
Score: 5.423163868410005
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To what extent can entire books be extracted from LLMs? Using the Llama 3 70B family of models, and the "prefix-prompting" extraction technique, we were able to auto-regressively reconstruct, with a very high level of similarity, one entire book (Alice's Adventures in Wonderland) from just the first 500 tokens. We were also able to obtain high extraction rates on several other books, piece-wise. However, these successes do not extend uniformly to all books. We show that extraction rates of books correlate with book popularity and thus, likely duplication in the training data. We also confirm the undoing of mitigations in the instruction-tuned Llama 3.1, following recent work (Nasr et al., 2025). We further find that this undoing comes from changes to only a tiny fraction of weights concentrated primarily in the lower transformer blocks. Our results provide evidence of the limits of current regurgitation mitigation strategies and introduce a framework for studying how fine-tuning affects the retrieval of verbatim memorization in aligned LLMs.

Related papers

TokAlign: Efficient Vocabulary Adaptation via Token Alignment [41.59130966729569]
Tokenization serves as a foundational step for Large Language Models (LLMs) to process text.<n>In new domains or languages, the inefficiency of the tokenizer will slow down the training and generation of LLM.<n>We propose an efficient method named TokAlign to replace the vocabulary of LLM from the token co-occurrences view.
arXiv Detail & Related papers (2025-06-04T03:15:57Z)
Extracting memorized pieces of (copyrighted) books from open-weight language models [64.69834802660128]
Drawing on adversarial ML and copyright law, we show that these polarized positions dramatically oversimplify the relationship between memorization and copyright.<n>We show that it's possible to extract substantial parts of at least some books from different LLMs.<n>We discuss why our results have significant implications for copyright cases, though not ones that unambiguously favor either side.
arXiv Detail & Related papers (2025-05-18T21:06:32Z)
Mitigating Copy Bias in In-Context Learning through Neuron Pruning [74.91243772654519]
Large language models (LLMs) have demonstrated impressive few-shot in-context learning abilities. They are sometimes prone to a copying bias', where they copy answers from provided examples instead of learning the underlying patterns. We propose a novel and simple method to mitigate such copying bias.
arXiv Detail & Related papers (2024-10-02T07:18:16Z)
From Theft to Bomb-Making: The Ripple Effect of Unlearning in Defending Against Jailbreak Attacks [85.84979847888157]
Large Language Models (LLMs) are known to be vulnerable to jailbreak attacks.<n>LLMs can implicitly unlearn harmful knowledge that was not explicitly introduced during the unlearning phase.<n>We empirically validate this phenomenon, which makes unlearning-based methods able to decrease the Attack Success Rate.
arXiv Detail & Related papers (2024-07-03T07:14:05Z)
Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning [37.061187080745654]
We show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of $textitbenign relearning attacks.<n>With access to only a small and potentially loosely related set of data, we find that we can ''jog'' the memory of unlearned models to reverse the effects of unlearning.
arXiv Detail & Related papers (2024-06-19T09:03:21Z)
LLMEmbed: Rethinking Lightweight LLM's Genuine Function in Text Classification [13.319594321038926]
We propose a simple and effective transfer learning strategy, namely LLMEmbed, to address this classical but challenging task. We perform extensive experiments on publicly available datasets, and the results show that LLMEmbed achieves strong performance while enjoys low training overhead.
arXiv Detail & Related papers (2024-06-06T03:46:59Z)
One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)<n>We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z)
Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction [36.40833517478628]
Large language models require updates to remain up-to-date or adapt to new domains. One key is memorizing the latest information in a way that the memorized information is extractable with a query prompt. Despite minimizing document perplexity during fine-tuning, LLMs struggle to extract information through a prompt sentence.
arXiv Detail & Related papers (2024-02-16T06:29:16Z)
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models. It achieves consistent and correct step-wise prompts in zero-shot scenarios. We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z)
Towards Robust Text Retrieval with Progressive Learning [31.81063977662941]
We propose the PEG, a progressively learned embedding for robust text retrieval. We increase the training in-batch negative samples to 80,000, and for each query, we extracted five hard negatives. PEG is trained on more than 100 million data, encompassing a wide range of domains.
arXiv Detail & Related papers (2023-11-20T11:44:01Z)
Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers [56.12593882838412]
We introduce a novel instruction distillation method to rank documents. We first rank documents using the effective pairwise approach with complex instructions, and then distill the teacher predictions to the pointwise approach with simpler instructions. Our approach surpasses the performance of existing supervised methods like monoT5 and is on par with the state-of-the-art zero-shot methods.
arXiv Detail & Related papers (2023-11-02T19:16:21Z)
Compressing LLMs: The Truth is Rarely Pure and Never Simple [90.05366363633568]
Knowledge-Intensive Compressed LLM BenchmarK aims to redefine the evaluation protocol for compressed Large Language Models. LLM-KICK unveils many favorable merits and unfortunate plights of current SoTA compression methods. LLM-KICK is designed to holistically access compressed LLMs' ability for language understanding, reasoning, generation, in-context retrieval, in-context summarization, etc.
arXiv Detail & Related papers (2023-10-02T17:42:37Z)
Can We Edit Factual Knowledge by In-Context Learning? [38.2498067309258]
In-context knowledge editing (IKE) achieves a competitive success rate compared to gradient-based methods. We show that IKE achieves less over-editing on similar but unrelated facts and less knowledge forgetting on previously stored knowledge.
arXiv Detail & Related papers (2023-05-22T06:07:58Z)
Self-Prompting Large Language Models for Zero-Shot Open-Domain QA [67.08732962244301]
Open-Domain Question Answering (ODQA) aims to answer questions without explicitly providing background documents. This task becomes notably challenging in a zero-shot setting where no data is available to train tailored retrieval-reader models. We propose a Self-Prompting framework to explicitly utilize the massive knowledge encoded in the parameters of Large Language Models.
arXiv Detail & Related papers (2022-12-16T18:23:43Z)
Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot Learning [76.98364915566292]
A common practice is to train a model on the base set first and then transfer to novel classes through fine-tuning. We propose to transfer partial knowledge by freezing or fine-tuning particular layer(s) in the base model. We conduct extensive experiments on CUB and mini-ImageNet to demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-02-08T03:27:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.