Related papers: CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory

CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory

URL: http://arxiv.org/abs/2402.13449v1
Date: Wed, 21 Feb 2024 01:00:17 GMT
Title: CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory
Authors: Zexue He, Leonid Karlinsky, Donghyun Kim, Julian McAuley, Dmitry Krotov, Rogerio Feris
Abstract summary: Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs. We introduce an associative memory module which can be coupled to any pre-trained (frozen) attention-based LLM without re-training. This architecture, which we call CAMELoT, demonstrates superior performance even with a tiny context window of 128 tokens.
Score: 38.429707659685974
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs. Memory-augmented models have emerged as a promising solution to this problem, but current methods are hindered by limited memory capacity and require costly re-training to integrate with a new LLM. In this work, we introduce an associative memory module which can be coupled to any pre-trained (frozen) attention-based LLM without re-training, enabling it to handle arbitrarily long input sequences. Unlike previous methods, our associative memory module consolidates representations of individual tokens into a non-parametric distribution model, dynamically managed by properly balancing the novelty and recency of the incoming data. By retrieving information from this consolidated associative memory, the base LLM can achieve significant (up to 29.7% on Arxiv) perplexity reduction in long-context modeling compared to other baselines evaluated on standard benchmarks. This architecture, which we call CAMELoT (Consolidated Associative Memory Enhanced Long Transformer), demonstrates superior performance even with a tiny context window of 128 tokens, and also enables improved in-context learning with a much larger set of demonstrations.

Related papers

LatentLLM: Attention-Aware Joint Tensor Compression [50.33925662486034]
Large language models (LLMs) and large multi-modal models (LMMs) require a massive amount of computational and memory resources.<n>We propose a new framework to convert such LLMs/LMMs into a reduced-dimension latent structure.
arXiv Detail & Related papers (2025-05-23T22:39:54Z)
MoM: Linear Sequence Modeling with Mixture-of-Memories [9.665802842933209]
We introduce a novel architecture called Mixture-of-Memories (MoM) MoM utilizes multiple independent memory states, with a router network directing input tokens to specific memory states. MoM performs exceptionally well on recall-intensive tasks, surpassing existing linear sequence modeling techniques.
arXiv Detail & Related papers (2025-02-19T12:53:55Z)
CMT: A Memory Compression Method for Continual Knowledge Learning of Large Language Models [22.93893181000535]
Large Language Models (LLMs) need to adapt to the continuous changes in data, tasks, and user preferences. To address these challenges, this paper proposes the Compression Memory Training (CMT) method. CMT compresses and extracts information from new documents to be stored in a memory bank. When answering to queries related to these new documents, the model aggregates these document memories from the memory bank to better answer user questions.
arXiv Detail & Related papers (2024-12-10T10:35:19Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module. B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z)
Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation [12.653336728447654]
We propose a class-shared memory (CSM) module consisting of a set of learnable memory vectors. These memory vectors learn elemental object patterns from base classes during training whilst re-encoding query features during both training and inference. We integrate CSM and UFA into representative FSS works, with experimental results on the widely-used PASCAL-5$i$ and COCO-20$i$ datasets.
arXiv Detail & Related papers (2024-06-01T19:53:25Z)
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module. Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular. We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z)
Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models. We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z)
Enhancing Large Language Model with Self-Controlled Memory Framework [56.38025154501917]
Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information. We propose the Self-Controlled Memory (SCM) framework to enhance the ability of LLMs to maintain long-term memory and recall relevant information.
arXiv Detail & Related papers (2023-04-26T07:25:31Z)
A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement. We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work. We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z)
Semantically Constrained Memory Allocation (SCMA) for Embedding in Efficient Recommendation Systems [27.419109620575313]
A key challenge for deep learning models is to work with millions of categorical classes or tokens. We propose a novel formulation of memory shared embedding, where memory is shared in proportion to the overlap in semantic information. We demonstrate a significant reduction in the memory footprint while maintaining performance.
arXiv Detail & Related papers (2021-02-24T19:55:49Z)
Learning Associative Inference Using Fast Weight Memory [12.239487954915646]
We augment the LSTM model with an associative memory, dubbed Fast Weight Memory (FWM) Our model is trained end-to-end by gradient descent and yields excellent performance on compositional language reasoning problems.
arXiv Detail & Related papers (2020-11-16T10:01:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.