Training Language Models with Memory Augmentation
- URL: http://arxiv.org/abs/2205.12674v1
- Date: Wed, 25 May 2022 11:37:29 GMT
- Title: Training Language Models with Memory Augmentation
- Authors: Zexuan Zhong, Tao Lei, Danqi Chen
- Abstract summary: We present a novel training approach designed for training language models with memory augmentation.
Our approach uses a training objective that directly takes in-batch examples as accessible memory.
We demonstrate significant gains over previous memory-augmented approaches.
- Score: 28.4608705738799
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has improved language models remarkably by equipping them with a
non-parametric memory component. However, most existing approaches only
introduce memories at testing time, or represent them using a separately
trained encoder -- resulting in sub-optimal training of the language model. In
this work, we present TRIME, a novel yet simple training approach designed for
training language models with memory augmentation. Our approach uses a training
objective that directly takes in-batch examples as accessible memory. We also
present new methods for memory construction and data batching, which are used
for adapting to different sets of memories -- local, long-term, and external
memory -- at testing time. We evaluate our approach on multiple language
modeling and machine translation benchmarks. We find that simply replacing the
vanilla language modeling objective by ours greatly reduces the perplexity,
without modifying the model architecture or incorporating extra context (e.g.,
18.70 $\to$ 17.76 on WikiText-103). We further augment language models with
long-range contexts and external knowledge and demonstrate significant gains
over previous memory-augmented approaches.
Related papers
- HMT: Hierarchical Memory Transformer for Long Context Language Processing [35.730941605490194]
Hierarchical Memory Transformer (HMT) is a novel framework that enables and improves models' long-context processing ability.
We show that HMT steadily improves the long-context processing ability of context-constrained and long-context models.
arXiv Detail & Related papers (2024-05-09T19:32:49Z) - CAMELoT: Towards Large Language Models with Training-Free Consolidated
Associative Memory [38.429707659685974]
Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs.
We introduce an associative memory module which can be coupled to any pre-trained (frozen) attention-based LLM without re-training.
This architecture, which we call CAMELoT, demonstrates superior performance even with a tiny context window of 128 tokens.
arXiv Detail & Related papers (2024-02-21T01:00:17Z) - MEMORYLLM: Towards Self-Updatable Large Language Models [101.3777486749529]
Existing Large Language Models (LLMs) usually remain static after deployment.
We introduce MEMORYLLM, a model that comprises a transformer and a fixed-size memory pool.
MEMORYLLM can self-update with text knowledge and memorize the knowledge injected earlier.
arXiv Detail & Related papers (2024-02-07T07:14:11Z) - Augmenting Language Models with Long-Term Memory [142.04940250657637]
Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit.
We propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history.
arXiv Detail & Related papers (2023-06-12T15:13:39Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens.
LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length.
Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z) - Memorizing Transformers [20.99393776139059]
We envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately.
We demonstrate that an approximate kNN lookup into a non-differentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks.
arXiv Detail & Related papers (2022-03-16T19:54:35Z) - Lightweight Cross-Lingual Sentence Representation Learning [57.9365829513914]
We introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations.
We propose a novel cross-lingual language model, which combines the existing single-word masked language model with the newly proposed cross-lingual token-level reconstruction task.
arXiv Detail & Related papers (2021-05-28T14:10:48Z) - Paraphrastic Representations at Scale [134.41025103489224]
We release trained models for English, Arabic, German, French, Spanish, Russian, Turkish, and Chinese languages.
We train these models on large amounts of data, achieving significantly improved performance from the original papers.
arXiv Detail & Related papers (2021-04-30T16:55:28Z) - Adaptive Semiparametric Language Models [17.53604394786977]
We present a language model that combines a large parametric neural network (i.e., a transformer) with a non-parametric episodic memory component.
Experiments on word-based and character-based language modeling datasets demonstrate the efficacy of our proposed method.
arXiv Detail & Related papers (2021-02-04T11:47:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.