Related papers: Beyond Memorization: The Challenge of Random Memory Access in Language Models

Beyond Memorization: The Challenge of Random Memory Access in Language Models

URL: http://arxiv.org/abs/2403.07805v3
Date: Mon, 22 Jul 2024 15:29:00 GMT
Title: Beyond Memorization: The Challenge of Random Memory Access in Language Models
Authors: Tongyao Zhu, Qian Liu, Liang Pang, Zhengbao Jiang, Min-Yen Kan, Min Lin,
Abstract summary: We investigate whether a generative Language Model (LM) is able to access its memory sequentially or randomly. We find that techniques including recitation and permutation improve the random memory access capability of LMs.
Score: 56.525691003233554
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent developments in Language Models (LMs) have shown their effectiveness in NLP tasks, particularly in knowledge-intensive tasks. However, the mechanisms underlying knowledge storage and memory access within their parameters remain elusive. In this paper, we investigate whether a generative LM (e.g., GPT-2) is able to access its memory sequentially or randomly. Through carefully-designed synthetic tasks, covering the scenarios of full recitation, selective recitation and grounded question answering, we reveal that LMs manage to sequentially access their memory while encountering challenges in randomly accessing memorized content. We find that techniques including recitation and permutation improve the random memory access capability of LMs. Furthermore, by applying this intervention to realistic scenarios of open-domain question answering, we validate that enhancing random access by recitation leads to notable improvements in question answering. The code to reproduce our experiments can be found at https://github.com/sail-sg/lm-random-memory-access.

Related papers

MemOS: A Memory OS for AI System [116.87568350346537]
Large Language Models (LLMs) have become an essential infrastructure for Artificial General Intelligence (AGI)<n>Existing models mainly rely on static parameters and short-lived contextual states, limiting their ability to track user preferences or update knowledge over extended periods.<n>MemOS is a memory operating system that treats memory as a manageable system resource.
arXiv Detail & Related papers (2025-07-04T17:21:46Z)
Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks [42.22616978679253]
We introduce Sequence Order Recall Tasks (SORT), which we adapt from tasks used to study episodic memory in cognitive psychology. SORT requires LLMs to recall the correct order of text segments, and provides a general framework that is both easily extendable and does not require any additional annotations. Based on a human experiment with 155 participants, we show that humans can recall sequence order based on long-term memory of a book.
arXiv Detail & Related papers (2024-10-10T17:17:38Z)
Unlocking Memorization in Large Language Models with Dynamic Soft Prompting [66.54460367290146]
Large language models (LLMs) have revolutionized natural language processing (NLP) tasks such as summarization, question answering, and translation. LLMs pose significant security risks due to their tendency to memorize training data, leading to potential privacy breaches and copyright infringement. We propose a novel method for estimating LLM memorization using dynamic, prefix-dependent soft prompts.
arXiv Detail & Related papers (2024-09-20T18:56:32Z)
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module. Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular. We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z)
Empowering Working Memory for Large Language Model Agents [9.83467478231344]
This paper explores the potential of applying cognitive psychology's working memory frameworks to large language models (LLMs) An innovative model is proposed incorporating a centralized Working Memory Hub and Episodic Buffer access to retain memories across episodes. This architecture aims to provide greater continuity for nuanced contextual reasoning during intricate tasks and collaborative scenarios.
arXiv Detail & Related papers (2023-12-22T05:59:00Z)
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models [75.98775135321355]
Given a long conversation, large language models (LLMs) fail to recall past information and tend to generate inconsistent responses. We propose to generate summaries/ memory using large language models (LLMs) to enhance long-term memory ability.
arXiv Detail & Related papers (2023-08-29T04:59:53Z)
RET-LLM: Towards a General Read-Write Memory for Large Language Models [53.288356721954514]
RET-LLM is a novel framework that equips large language models with a general write-read memory unit. Inspired by Davidsonian semantics theory, we extract and save knowledge in the form of triplets. Our framework exhibits robust performance in handling temporal-based question answering tasks.
arXiv Detail & Related papers (2023-05-23T17:53:38Z)
Learning to Rehearse in Long Sequence Memorization [107.14601197043308]
Existing reasoning tasks often have an important assumption that the input contents can be always accessed while reasoning. Memory augmented neural networks introduce a human-like write-read memory to compress and memorize the long input sequence in one pass. But they have two serious drawbacks: 1) they continually update the memory from current information and inevitably forget the early contents; 2) they do not distinguish what information is important and treat all contents equally. We propose the Rehearsal Memory to enhance long-sequence memorization by self-supervised rehearsal with a history sampler.
arXiv Detail & Related papers (2021-06-02T11:58:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.