LaMemo: Language Modeling with Look-Ahead Memory
- URL: http://arxiv.org/abs/2204.07341v1
- Date: Fri, 15 Apr 2022 06:11:25 GMT
- Title: LaMemo: Language Modeling with Look-Ahead Memory
- Authors: Haozhe Ji, Rongsheng Zhang, Zhenyu Yang, Zhipeng Hu, Minlie Huang
- Abstract summary: We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens.
LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length.
Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
- Score: 50.6248714811912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although Transformers with fully connected self-attentions are powerful to
model long-term dependencies, they are struggling to scale to long texts with
thousands of words in language modeling. One of the solutions is to equip the
model with a recurrence memory. However, existing approaches directly reuse
hidden states from the previous segment that encodes contexts in a
uni-directional way. As a result, this prohibits the memory to dynamically
interact with the current context that provides up-to-date information for
token prediction. To remedy this issue, we propose Look-Ahead Memory (LaMemo)
that enhances the recurrence memory by incrementally attending to the
right-side tokens, and interpolating with the old memory states to maintain
long-term information in the history. LaMemo embraces bi-directional attention
and segment recurrence with an additional computation overhead only linearly
proportional to the memory length. Experiments on widely used language modeling
benchmarks demonstrate its superiority over the baselines equipped with
different types of memory.
Related papers
- Titans: Learning to Memorize at Test Time [20.12643072017223]
We present a new neural long-term memory module that learns to memorize historical context.
We show that this neural memory has the advantage of fast parallelizable training while maintaining a fast inference.
We introduce a new family of architectures, called Titans, and present three variants to address how one can effectively incorporate memory into this architecture.
arXiv Detail & Related papers (2024-12-31T22:32:03Z) - HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing [33.720656946186885]
Hierarchical Memory Transformer (HMT) is a novel framework that facilitates a model's long-context processing ability.
HMT consistently improves the long-context processing ability of existing models.
arXiv Detail & Related papers (2024-05-09T19:32:49Z) - Augmenting Language Models with Long-Term Memory [142.04940250657637]
Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit.
We propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history.
arXiv Detail & Related papers (2023-06-12T15:13:39Z) - Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory [72.36736686941671]
We propose a novel framework, selfmem, for improving retrieval-augmented generation models.
Selfmem iteratively employs a retrieval-augmented generator to create an unbounded memory pool and using a memory selector to choose one output as memory for the subsequent generation round.
We evaluate the effectiveness of selfmem on three distinct text generation tasks.
arXiv Detail & Related papers (2023-05-03T21:40:54Z) - Training Language Models with Memory Augmentation [28.4608705738799]
We present a novel training approach designed for training language models with memory augmentation.
Our approach uses a training objective that directly takes in-batch examples as accessible memory.
We demonstrate significant gains over previous memory-augmented approaches.
arXiv Detail & Related papers (2022-05-25T11:37:29Z) - ABC: Attention with Bounded-memory Control [67.40631793251997]
We show that bounded-memory control (ABC) can be subsumed into one abstraction, attention with bounded-memory control (ABC)
ABC reveals new, unexplored possibilities. First, it connects several efficient attention variants that would otherwise seem apart.
Last, we present a new instance of ABC, which draws inspiration from existing ABC approaches, but replaces their memory-organizing functions with a learned, contextualized one.
arXiv Detail & Related papers (2021-10-06T03:53:25Z) - Memory-Based Semantic Parsing [79.48882899104997]
We present a memory-based model for context-dependent semantic parsing.
We learn a context memory controller that manages the memory by maintaining the cumulative meaning of sequential user utterances.
arXiv Detail & Related papers (2021-09-07T16:15:13Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.