Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
- URL: http://arxiv.org/abs/2508.09874v2
- Date: Thu, 23 Oct 2025 13:14:04 GMT
- Title: Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
- Authors: Jiaqi Cao, Jiarui Wang, Rubin Wei, Qipeng Guo, Kai Chen, Bowen Zhou, Zhouhan Lin,
- Abstract summary: This paper introduces Memory Decoder, a plug-and-play pretrained memory that enables efficient domain adaptation without changing the original model's parameters.<n> Experimental results demonstrate that Memory Decoder enables effective adaptation of various Qwen and Llama models to three distinct specialized domains: biomedicine, finance, and law, reducing perplexity by an average of 6.17 points.
- Score: 46.59786740489401
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown strong abilities in general language tasks, yet adapting them to specific domains remains a challenge. Current method like Domain Adaptive Pretraining (DAPT) requires costly full-parameter training and suffers from catastrophic forgetting. Meanwhile, Retrieval-Augmented Generation (RAG) introduces substantial inference latency due to expensive nearest-neighbor searches and longer context. This paper introduces Memory Decoder, a plug-and-play pretrained memory that enables efficient domain adaptation without changing the original model's parameters. Memory Decoder employs a small transformer decoder that learns to imitate the behavior of an external non-parametric retriever. Once trained, Memory Decoder can be seamlessly integrated with any pretrained language model that shares the same tokenizer, requiring no model-specific modifications. Experimental results demonstrate that Memory Decoder enables effective adaptation of various Qwen and Llama models to three distinct specialized domains: biomedicine, finance, and law, reducing perplexity by an average of 6.17 points. Overall, Memory Decoder introduces a novel paradigm centered on a specially pretrained memory component designed for domain-specific adaptation. This memory architecture can be integrated in a plug-and-play manner, consistently enhancing performance across multiple models within the target domain.
Related papers
- Memory Caching: RNNs with Growing Memory [56.25483647131372]
We introduce Memory Caching (MC), a technique that enhances recurrent models by caching checkpoints of memory states (a.k.a. hidden states)<n>We propose four variants of MC, including gated aggregation and sparse selective mechanisms, and discuss their implications on both linear and deep memory modules.<n>The results indicate that while Transformers achieve the best accuracy, our MC variants show competitive performance, close the gap with Transformers, and performs better than state-of-the-art recurrent models.
arXiv Detail & Related papers (2026-02-27T18:53:41Z) - MemAdapter: Fast Alignment across Agent Memory Paradigms via Generative Subgraph Retrieval [25.68006224976726]
Memory mechanism is a core component of LLM-based agents, enabling reasoning and knowledge discovery over long-horizon contexts.<n>Existing agent memory systems are typically designed within isolated paradigms with tightly coupled retrieval methods.<n>MemAdapter is a memory retrieval framework that enables fast alignment across agent memory paradigms.
arXiv Detail & Related papers (2026-02-09T08:09:25Z) - Pretraining with hierarchical memories: separating long-tail and common knowledge [32.22296691842835]
We introduce small language models that access large hierarchical parametric memory banks encoding world knowledge.<n>During pretraining and inference, we fetch a small, context-dependent memory block and add it to the model.<n>Our pretraining learns to store long-tail world knowledge in the memory parameters, while the small language model acts as an anchor capturing general reasoning abilities.
arXiv Detail & Related papers (2025-09-29T17:59:50Z) - MLP Memory: Language Modeling with Retriever-pretrained External Memory [26.033369983243624]
We propose to decouple from a decoder using a pretrained, differentiable external memory.<n>Our architecture exhibits strong perplexity and performance on downstream tasks.<n>We demonstrate superior performance on three hallucination benchmarks and nine memory-intensive tasks.
arXiv Detail & Related papers (2025-08-03T16:40:53Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - Recurrent Action Transformer with Memory [39.58317527488534]
This paper proposes a novel model architecture that incorporates a recurrent memory mechanism designed to regulate information retention.
We conduct experiments on memory-intensive environments (ViZDoom-Two-Colors, T-Maze, Memory Maze, Minigrid-Memory), classic Atari games, and MuJoCo control environments.
The results show that using memory can significantly improve performance in memory-intensive environments, while maintaining or improving results in classic environments.
arXiv Detail & Related papers (2023-06-15T19:29:08Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - Training Language Models with Memory Augmentation [28.4608705738799]
We present a novel training approach designed for training language models with memory augmentation.
Our approach uses a training objective that directly takes in-batch examples as accessible memory.
We demonstrate significant gains over previous memory-augmented approaches.
arXiv Detail & Related papers (2022-05-25T11:37:29Z) - Memory-Guided Semantic Learning Network for Temporal Sentence Grounding [55.31041933103645]
We propose a memory-augmented network that learns and memorizes the rarely appeared content in TSG tasks.
MGSL-Net consists of three main parts: a cross-modal inter-action module, a memory augmentation module, and a heterogeneous attention module.
arXiv Detail & Related papers (2022-01-03T02:32:06Z) - $DA^3$: Deep Additive Attention Adaption for Memory-Efficient On-Device
Multi-Domain Learning [30.53018068935323]
Large memory used for activation storage is the bottleneck that largely limits the training time and cost on edge devices.
We propose Deep Additive Attention Adaption, a novel memory-efficient on-device multi-domain learning method.
We validate $DA3$ on multiple datasets against state-of-the-art methods, which shows good improvement in both accuracy and training time.
arXiv Detail & Related papers (2020-12-02T18:03:18Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.