Pruning as a Defense: Reducing Memorization in Large Language Models
- URL: http://arxiv.org/abs/2502.15796v1
- Date: Tue, 18 Feb 2025 19:32:10 GMT
- Title: Pruning as a Defense: Reducing Memorization in Large Language Models
- Authors: Mansi Gupta, Nikhar Waghela, Sarthak Gupta, Shourya Goel, Sanjif Shanmugavelu,
- Abstract summary: Large language models have been shown to memorize significant portions of their training data.<n>This work investigates the impact of simple pruning techniques on this behavior.
- Score: 4.280531541084464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models have been shown to memorize significant portions of their training data, which they can reproduce when appropriately prompted. This work investigates the impact of simple pruning techniques on this behavior. Our findings reveal that pruning effectively reduces the extent of memorization in LLMs, demonstrating its potential as a foundational approach for mitigating membership inference attacks.
Related papers
- Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models [53.4530106173067]
Large language models (LLMs) with reinforcement learning (RL) have shown promising improvements in complex reasoning tasks.
RL remains challenging for tiny LLMs with 1 billion parameters or fewer because they lack the necessary pretraining strength to explore effectively.
This work introduces a novel intrinsic motivation approach that leverages episodic memory to address this challenge.
arXiv Detail & Related papers (2025-04-03T04:46:17Z) - Mitigating Memorization in LLMs using Activation Steering [3.5782765808288475]
memorization of training data by Large Language Models (LLMs) poses significant risks, including privacy leaks and the regurgitation of copyrighted content.
Activation steering, a technique that directly intervenes in model activations, has emerged as a promising approach for manipulating LLMs.
arXiv Detail & Related papers (2025-03-08T03:37:07Z) - Detecting Memorization in Large Language Models [0.0]
Large language models (LLMs) have achieved impressive results in natural language processing but are prone to memorizing portions of their training data.<n>Traditional methods for detecting memorization rely on output probabilities or loss functions.<n>We introduce an analytical method that precisely detects memorization by examining neuron activations within the LLM.
arXiv Detail & Related papers (2024-12-02T00:17:43Z) - Mitigating Memorization In Language Models [37.899013074095336]
Language models (LMs) can "memorize" information, encode training data in their weights in such a way that inference-time queries can lead to verbatim regurgitation of that data.<n>We introduce TinyMem, a suite of small, computationally-efficient LMs for the rapid development and evaluation of memorization-mitigation methods.<n>We show, in particular, that our proposed unlearning method BalancedSubnet outperforms other mitigation methods at removing memorized information while preserving performance on target tasks.
arXiv Detail & Related papers (2024-10-03T02:53:51Z) - Predicting and analyzing memorization within fine-tuned Large Language Models [0.0]
Large Language Models memorize a significant proportion of their training data, posing a serious threat when disclosed at inference time.
We propose a new approach based on sliced mutual information to detect memorized samples a priori.
We obtain strong empirical results, paving the way for systematic inspection and protection of these vulnerable samples before memorization happens.
arXiv Detail & Related papers (2024-09-27T15:53:55Z) - Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We introduce an extended concept of memorization, distributional memorization, which measures the correlation between the output probabilities and the pretraining data frequency.<n>We show that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks.
arXiv Detail & Related papers (2024-07-20T21:24:40Z) - Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs [101.51435599249234]
We propose an axiomatic system to define and quantify the precise memorization and in-context reasoning effects used by the large language model (LLM)
Specifically, the axiomatic system enables us to categorize the memorization effects into foundational memorization effects and chaotic memorization effects.
Experiments show that the clear disentanglement of memorization effects and in-context reasoning effects enables a straightforward examination of detailed inference patterns encoded by LLMs.
arXiv Detail & Related papers (2024-05-20T08:51:03Z) - Understanding Privacy Risks of Embeddings Induced by Large Language Models [75.96257812857554]
Large language models show early signs of artificial general intelligence but struggle with hallucinations.
One promising solution is to store external knowledge as embeddings, aiding LLMs in retrieval-augmented generation.
Recent studies experimentally showed that the original text can be partially reconstructed from text embeddings by pre-trained language models.
arXiv Detail & Related papers (2024-04-25T13:10:48Z) - Setting the Trap: Capturing and Defeating Backdoors in Pretrained
Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks.
We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively.
Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z) - Mitigating Approximate Memorization in Language Models via Dissimilarity
Learned Policy [0.0]
Large Language models (LLMs) are trained on large amounts of data.
LLMs showed to memorize parts of the training data and emit those data verbatim when an adversary prompts appropriately.
arXiv Detail & Related papers (2023-05-02T15:53:28Z) - Decoupling Knowledge from Memorization: Retrieval-augmented Prompt
Learning [113.58691755215663]
We develop RetroPrompt to help a model strike a balance between generalization and memorization.
In contrast with vanilla prompt learning, RetroPrompt constructs an open-book knowledge-store from training instances.
Extensive experiments demonstrate that RetroPrompt can obtain better performance in both few-shot and zero-shot settings.
arXiv Detail & Related papers (2022-05-29T16:07:30Z) - ReMP: Rectified Metric Propagation for Few-Shot Learning [67.96021109377809]
A rectified metric space is learned to maintain the metric consistency from training to testing.
Numerous analyses indicate that a simple modification of the objective can yield substantial performance gains.
The proposed ReMP is effective and efficient, and outperforms the state of the arts on various standard few-shot learning datasets.
arXiv Detail & Related papers (2020-12-02T00:07:53Z) - Remembering for the Right Reasons: Explanations Reduce Catastrophic
Forgetting [100.75479161884935]
We propose a novel training paradigm called Remembering for the Right Reasons (RRR)
RRR stores visual model explanations for each example in the buffer and ensures the model has "the right reasons" for its predictions.
We demonstrate how RRR can be easily added to any memory or regularization-based approach and results in reduced forgetting.
arXiv Detail & Related papers (2020-10-04T10:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.