Related papers: LaMemo: Language Modeling with Look-Ahead Memory

LaMemo: Language Modeling with Look-Ahead Memory

URL: http://arxiv.org/abs/2204.07341v1
Date: Fri, 15 Apr 2022 06:11:25 GMT
Title: LaMemo: Language Modeling with Look-Ahead Memory
Authors: Haozhe Ji, Rongsheng Zhang, Zhenyu Yang, Zhipeng Hu, Minlie Huang
Abstract summary: We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens. LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
Score: 50.6248714811912
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a result, this prohibits the memory to dynamically interact with the current context that provides up-to-date information for token prediction. To remedy this issue, we propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens, and interpolating with the old memory states to maintain long-term information in the history. LaMemo embraces bi-directional attention and segment recurrence with an additional computation overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.

Related papers

Titans: Learning to Memorize at Test Time [20.12643072017223]
We present a new neural long-term memory module that learns to memorize historical context. We show that this neural memory has the advantage of fast parallelizable training while maintaining a fast inference. We introduce a new family of architectures, called Titans, and present three variants to address how one can effectively incorporate memory into this architecture.
arXiv Detail & Related papers (2024-12-31T22:32:03Z)
HMT: Hierarchical Memory Transformer for Long Context Language Processing [35.730941605490194]
Hierarchical Memory Transformer (HMT) is a novel framework that enables and improves models' long-context processing ability. We show that HMT steadily improves the long-context processing ability of context-constrained and long-context models.
arXiv Detail & Related papers (2024-05-09T19:32:49Z)
Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations [39.05338079159942]
This study introduces a novel framework, COmpressive Memory-Enhanced Dialogue sYstems (COMEDY), which eschews traditional retrieval modules and memory databases. Central to COMEDY is the concept of compressive memory, which intergrates session-specific summaries, user-bot dynamics, and past events into a concise memory format.
arXiv Detail & Related papers (2024-02-19T09:19:50Z)
Augmenting Language Models with Long-Term Memory [142.04940250657637]
Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit. We propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history.
arXiv Detail & Related papers (2023-06-12T15:13:39Z)
Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory [72.36736686941671]
We propose a novel framework, selfmem, for improving retrieval-augmented generation models. Selfmem iteratively employs a retrieval-augmented generator to create an unbounded memory pool and using a memory selector to choose one output as memory for the subsequent generation round. We evaluate the effectiveness of selfmem on three distinct text generation tasks.
arXiv Detail & Related papers (2023-05-03T21:40:54Z)
Training Language Models with Memory Augmentation [28.4608705738799]
We present a novel training approach designed for training language models with memory augmentation. Our approach uses a training objective that directly takes in-batch examples as accessible memory. We demonstrate significant gains over previous memory-augmented approaches.
arXiv Detail & Related papers (2022-05-25T11:37:29Z)
ABC: Attention with Bounded-memory Control [67.40631793251997]
We show that bounded-memory control (ABC) can be subsumed into one abstraction, attention with bounded-memory control (ABC) ABC reveals new, unexplored possibilities. First, it connects several efficient attention variants that would otherwise seem apart. Last, we present a new instance of ABC, which draws inspiration from existing ABC approaches, but replaces their memory-organizing functions with a learned, contextualized one.
arXiv Detail & Related papers (2021-10-06T03:53:25Z)
Memory-Based Semantic Parsing [79.48882899104997]
We present a memory-based model for context-dependent semantic parsing. We learn a context memory controller that manages the memory by maintaining the cumulative meaning of sequential user utterances.
arXiv Detail & Related papers (2021-09-07T16:15:13Z)
Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling. Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.