Related papers: Training Language Models with Memory Augmentation

Training Language Models with Memory Augmentation

URL: http://arxiv.org/abs/2205.12674v1
Date: Wed, 25 May 2022 11:37:29 GMT
Title: Training Language Models with Memory Augmentation
Authors: Zexuan Zhong, Tao Lei, Danqi Chen
Abstract summary: We present a novel training approach designed for training language models with memory augmentation. Our approach uses a training objective that directly takes in-batch examples as accessible memory. We demonstrate significant gains over previous memory-augmented approaches.
Score: 28.4608705738799
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work has improved language models remarkably by equipping them with a non-parametric memory component. However, most existing approaches only introduce memories at testing time, or represent them using a separately trained encoder -- resulting in sub-optimal training of the language model. In this work, we present TRIME, a novel yet simple training approach designed for training language models with memory augmentation. Our approach uses a training objective that directly takes in-batch examples as accessible memory. We also present new methods for memory construction and data batching, which are used for adapting to different sets of memories -- local, long-term, and external memory -- at testing time. We evaluate our approach on multiple language modeling and machine translation benchmarks. We find that simply replacing the vanilla language modeling objective by ours greatly reduces the perplexity, without modifying the model architecture or incorporating extra context (e.g., 18.70 $\to$ 17.76 on WikiText-103). We further augment language models with long-range contexts and external knowledge and demonstrate significant gains over previous memory-augmented approaches.

Related papers

Structured Memory Mechanisms for Stable Context Representation in Large Language Models [16.929937978584917]
The model integrates explicit memory units, gated writing mechanisms, and attention-based reading modules.<n>A forgetting function is introduced to enable dynamic updates of memory content.<n>The model achieves clear advantages in text generation consistency, stability in multi-turn question answering, and accuracy in cross-context reasoning.
arXiv Detail & Related papers (2025-05-28T22:49:04Z)
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [54.59207567677249]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z)
Titans: Learning to Memorize at Test Time [20.12643072017223]
We present a new neural long-term memory module that learns to memorize historical context. We show that this neural memory has the advantage of fast parallelizable training while maintaining a fast inference. We introduce a new family of architectures, called Titans, and present three variants to address how one can effectively incorporate memory into this architecture.
arXiv Detail & Related papers (2024-12-31T22:32:03Z)
HMT: Hierarchical Memory Transformer for Long Context Language Processing [35.730941605490194]
Hierarchical Memory Transformer (HMT) is a novel framework that enables and improves models' long-context processing ability. We show that HMT steadily improves the long-context processing ability of context-constrained and long-context models.
arXiv Detail & Related papers (2024-05-09T19:32:49Z)
CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory [38.429707659685974]
Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs. We introduce an associative memory module which can be coupled to any pre-trained (frozen) attention-based LLM without re-training. This architecture, which we call CAMELoT, demonstrates superior performance even with a tiny context window of 128 tokens.
arXiv Detail & Related papers (2024-02-21T01:00:17Z)
MEMORYLLM: Towards Self-Updatable Large Language Models [101.3777486749529]
Existing Large Language Models (LLMs) usually remain static after deployment. We introduce MEMORYLLM, a model that comprises a transformer and a fixed-size memory pool. MEMORYLLM can self-update with text knowledge and memorize the knowledge injected earlier.
arXiv Detail & Related papers (2024-02-07T07:14:11Z)
Augmenting Language Models with Long-Term Memory [142.04940250657637]
Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit. We propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history.
arXiv Detail & Related papers (2023-06-12T15:13:39Z)
A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement. We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work. We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z)
LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens. LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z)
Memorizing Transformers [20.99393776139059]
We envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. We demonstrate that an approximate kNN lookup into a non-differentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks.
arXiv Detail & Related papers (2022-03-16T19:54:35Z)
Lightweight Cross-Lingual Sentence Representation Learning [57.9365829513914]
We introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations. We propose a novel cross-lingual language model, which combines the existing single-word masked language model with the newly proposed cross-lingual token-level reconstruction task.
arXiv Detail & Related papers (2021-05-28T14:10:48Z)
Paraphrastic Representations at Scale [134.41025103489224]
We release trained models for English, Arabic, German, French, Spanish, Russian, Turkish, and Chinese languages. We train these models on large amounts of data, achieving significantly improved performance from the original papers.
arXiv Detail & Related papers (2021-04-30T16:55:28Z)
Adaptive Semiparametric Language Models [17.53604394786977]
We present a language model that combines a large parametric neural network (i.e., a transformer) with a non-parametric episodic memory component. Experiments on word-based and character-based language modeling datasets demonstrate the efficacy of our proposed method.
arXiv Detail & Related papers (2021-02-04T11:47:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.