Exploring Approaches for Detecting Memorization of Recommender System Data in Large Language Models
- URL: http://arxiv.org/abs/2601.02002v1
- Date: Mon, 05 Jan 2026 11:03:56 GMT
- Title: Exploring Approaches for Detecting Memorization of Recommender System Data in Large Language Models
- Authors: Antonio Colacicco, Vito Guida, Dario Di Palma, Fedelucio Narducci, Tommaso Di Noia,
- Abstract summary: Large Language Models (LLMs) are increasingly applied in recommendation scenarios due to their strong natural language understanding and generation capabilities.<n>Recent work has shown that the MovieLens-1M dataset is memorized by both the LLaMA and OpenAI model families.
- Score: 10.071073998660525
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large Language Models (LLMs) are increasingly applied in recommendation scenarios due to their strong natural language understanding and generation capabilities. However, they are trained on vast corpora whose contents are not publicly disclosed, raising concerns about data leakage. Recent work has shown that the MovieLens-1M dataset is memorized by both the LLaMA and OpenAI model families, but the extraction of such memorized data has so far relied exclusively on manual prompt engineering. In this paper, we pose three main questions: Is it possible to enhance manual prompting? Can LLM memorization be detected through methods beyond manual prompting? And can the detection of data leakage be automated? To address these questions, we evaluate three approaches: (i) jailbreak prompt engineering; (ii) unsupervised latent knowledge discovery, probing internal activations via Contrast-Consistent Search (CCS) and Cluster-Norm; and (iii) Automatic Prompt Engineering (APE), which frames prompt discovery as a meta-learning process that iteratively refines candidate instructions. Experiments on MovieLens-1M using LLaMA models show that jailbreak prompting does not improve the retrieval of memorized items and remains inconsistent; CCS reliably distinguishes genuine from fabricated movie titles but fails on numerical user and rating data; and APE retrieves item-level information with moderate success yet struggles to recover numerical interactions. These findings suggest that automatically optimizing prompts is the most promising strategy for extracting memorized samples.
Related papers
- Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models [52.439289085318634]
We show how to identify training data known to proprietary large language models (LLMs) by using information-guided probes.<n>Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes.
arXiv Detail & Related papers (2025-03-15T10:19:15Z) - Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering [108.2131720470005]
Large language models (LLMs) have demonstrated remarkable performance across various real-world tasks.
They often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or hallucinated.
We propose AutoPASTA, a method that automatically identifies key contextual information and explicitly highlights it by steering an LLM's attention scores.
arXiv Detail & Related papers (2024-09-16T23:52:41Z) - Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question Answering [2.6524539020042663]
Large Language Models (LLMs) frequently lack domain-specific knowledge and even fine-tuned models tend to hallucinate.
We present a pipeline, 4StepFocus, and specifically a preprocessing step, that can substantially improve the answers of LLMs.
The method narrows down potentially correct answers by triplets-based searches in a semi-structured knowledge base in a direct, traceable fashion.
arXiv Detail & Related papers (2024-09-01T22:43:27Z) - AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses.<n>Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies.<n>We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z) - Get my drift? Catching LLM Task Drift with Activation Deltas [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.<n>We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.<n>We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction [36.40833517478628]
Large language models require updates to remain up-to-date or adapt to new domains.<n>One key is memorizing the latest information in a way that the memorized information is extractable with a query prompt.<n>Despite minimizing document perplexity during fine-tuning, LLMs struggle to extract information through a prompt sentence.
arXiv Detail & Related papers (2024-02-16T06:29:16Z) - Prompt-Time Symbolic Knowledge Capture with Large Language Models [0.0]
Augmenting large language models (LLMs) with user-specific knowledge is crucial for real-world applications, such as personal AI assistants.
This paper investigates utilizing the existing LLM capabilities to enable prompt-driven knowledge capture.
arXiv Detail & Related papers (2024-02-01T08:15:28Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - Self-Prompting Large Language Models for Zero-Shot Open-Domain QA [67.08732962244301]
Open-Domain Question Answering (ODQA) aims to answer questions without explicitly providing background documents.
This task becomes notably challenging in a zero-shot setting where no data is available to train tailored retrieval-reader models.
We propose a Self-Prompting framework to explicitly utilize the massive knowledge encoded in the parameters of Large Language Models.
arXiv Detail & Related papers (2022-12-16T18:23:43Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.