CAMELoT: Towards Large Language Models with Training-Free Consolidated
Associative Memory
- URL: http://arxiv.org/abs/2402.13449v1
- Date: Wed, 21 Feb 2024 01:00:17 GMT
- Title: CAMELoT: Towards Large Language Models with Training-Free Consolidated
Associative Memory
- Authors: Zexue He, Leonid Karlinsky, Donghyun Kim, Julian McAuley, Dmitry
Krotov, Rogerio Feris
- Abstract summary: Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs.
We introduce an associative memory module which can be coupled to any pre-trained (frozen) attention-based LLM without re-training.
This architecture, which we call CAMELoT, demonstrates superior performance even with a tiny context window of 128 tokens.
- Score: 38.429707659685974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) struggle to handle long input sequences due to
high memory and runtime costs. Memory-augmented models have emerged as a
promising solution to this problem, but current methods are hindered by limited
memory capacity and require costly re-training to integrate with a new LLM. In
this work, we introduce an associative memory module which can be coupled to
any pre-trained (frozen) attention-based LLM without re-training, enabling it
to handle arbitrarily long input sequences. Unlike previous methods, our
associative memory module consolidates representations of individual tokens
into a non-parametric distribution model, dynamically managed by properly
balancing the novelty and recency of the incoming data. By retrieving
information from this consolidated associative memory, the base LLM can achieve
significant (up to 29.7% on Arxiv) perplexity reduction in long-context
modeling compared to other baselines evaluated on standard benchmarks. This
architecture, which we call CAMELoT (Consolidated Associative Memory Enhanced
Long Transformer), demonstrates superior performance even with a tiny context
window of 128 tokens, and also enables improved in-context learning with a much
larger set of demonstrations.
Related papers
- SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module.
B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z) - Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation [12.653336728447654]
We propose a class-shared memory (CSM) module consisting of a set of learnable memory vectors.
These memory vectors learn elemental object patterns from base classes during training whilst re-encoding query features during both training and inference.
We integrate CSM and UFA into representative FSS works, with experimental results on the widely-used PASCAL-5$i$ and COCO-20$i$ datasets.
arXiv Detail & Related papers (2024-06-01T19:53:25Z) - MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module.
Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular.
We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - Enhancing Large Language Model with Self-Controlled Memory Framework [56.38025154501917]
Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information.
We propose the Self-Controlled Memory (SCM) framework to enhance the ability of LLMs to maintain long-term memory and recall relevant information.
arXiv Detail & Related papers (2023-04-26T07:25:31Z) - Semantically Constrained Memory Allocation (SCMA) for Embedding in
Efficient Recommendation Systems [27.419109620575313]
A key challenge for deep learning models is to work with millions of categorical classes or tokens.
We propose a novel formulation of memory shared embedding, where memory is shared in proportion to the overlap in semantic information.
We demonstrate a significant reduction in the memory footprint while maintaining performance.
arXiv Detail & Related papers (2021-02-24T19:55:49Z) - Learning Associative Inference Using Fast Weight Memory [12.239487954915646]
We augment the LSTM model with an associative memory, dubbed Fast Weight Memory (FWM)
Our model is trained end-to-end by gradient descent and yields excellent performance on compositional language reasoning problems.
arXiv Detail & Related papers (2020-11-16T10:01:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.