Augmenting Language Models with Long-Term Memory
- URL: http://arxiv.org/abs/2306.07174v1
- Date: Mon, 12 Jun 2023 15:13:39 GMT
- Title: Augmenting Language Models with Long-Term Memory
- Authors: Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng
Gao, Furu Wei
- Abstract summary: Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit.
We propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history.
- Score: 142.04940250657637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing large language models (LLMs) can only afford fix-sized inputs due to
the input length limit, preventing them from utilizing rich long-context
information from past inputs. To address this, we propose a framework, Language
Models Augmented with Long-Term Memory (LongMem), which enables LLMs to
memorize long history. We design a novel decoupled network architecture with
the original backbone LLM frozen as a memory encoder and an adaptive residual
side-network as a memory retriever and reader. Such a decoupled memory design
can easily cache and update long-term past contexts for memory retrieval
without suffering from memory staleness. Enhanced with memory-augmented
adaptation training, LongMem can thus memorize long past context and use
long-term memory for language modeling. The proposed memory retrieval module
can handle unlimited-length context in its memory bank to benefit various
downstream tasks. Typically, LongMem can enlarge the long-form memory to 65k
tokens and thus cache many-shot extra demonstration examples as long-form
memory for in-context learning. Experiments show that our method outperforms
strong long-context models on ChapterBreak, a challenging long-context modeling
benchmark, and achieves remarkable improvements on memory-augmented in-context
learning over LLMs. The results demonstrate that the proposed method is
effective in helping language models to memorize and utilize long-form
contents. Our code is open-sourced at https://aka.ms/LongMem.
Related papers
- MemLong: Memory-Augmented Retrieval for Long Text Modeling [37.49036666949963]
This work introduces MemLong: Memory-Augmented Retrieval for Long Text Generation.
MemLong combines a non-differentiable ret-mem'' module with a partially trainable decoder-only language model.
Comprehensive evaluations on multiple long-context language modeling benchmarks demonstrate that MemLong consistently outperforms other state-of-the-art LLMs.
arXiv Detail & Related papers (2024-08-30T02:01:56Z) - Needle in the Haystack for Memory Based Large Language Models [31.885539843977472]
Current large language models (LLMs) often perform poorly on simple fact retrieval tasks.
We investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem.
We demonstrate that the external memory of Larimar, which allows fast write and read of an episode of text samples, can be used at test time to handle contexts much longer than those seen during training.
arXiv Detail & Related papers (2024-07-01T16:32:16Z) - SirLLM: Streaming Infinite Retentive LLM [74.40196814292426]
Large Language Models (LLMs) process inputs of any length and maintain a degree of memory.
Recent efforts have employed streaming inputs to alleviate the pressure of excessively long text inputs.
We introduce Streaming Infinite Retentive LLM (SirLLM), which allows LLMs to maintain longer memory during infinite-length dialogues.
arXiv Detail & Related papers (2024-05-21T06:37:03Z) - HMT: Hierarchical Memory Transformer for Long Context Language Processing [35.730941605490194]
Hierarchical Memory Transformer (HMT) is a novel framework that enables and improves models' long-context processing ability.
We show that HMT steadily improves the long-context processing ability of context-constrained and long-context models.
arXiv Detail & Related papers (2024-05-09T19:32:49Z) - Recursively Summarizing Enables Long-Term Dialogue Memory in Large
Language Models [75.98775135321355]
Given a long conversation, large language models (LLMs) fail to recall past information and tend to generate inconsistent responses.
We propose to generate summaries/ memory using large language models (LLMs) to enhance long-term memory ability.
arXiv Detail & Related papers (2023-08-29T04:59:53Z) - RET-LLM: Towards a General Read-Write Memory for Large Language Models [53.288356721954514]
RET-LLM is a novel framework that equips large language models with a general write-read memory unit.
Inspired by Davidsonian semantics theory, we extract and save knowledge in the form of triplets.
Our framework exhibits robust performance in handling temporal-based question answering tasks.
arXiv Detail & Related papers (2023-05-23T17:53:38Z) - Enhancing Large Language Model with Self-Controlled Memory Framework [56.38025154501917]
Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information.
We propose the Self-Controlled Memory (SCM) framework to enhance the ability of LLMs to maintain long-term memory and recall relevant information.
arXiv Detail & Related papers (2023-04-26T07:25:31Z) - LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens.
LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length.
Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.