Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
- URL: http://arxiv.org/abs/2406.18400v2
- Date: Wed, 27 Nov 2024 22:41:03 GMT
- Title: Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
- Authors: Yibo Jiang, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam,
- Abstract summary: Large Language Models (LLMs) have the capacity to store and recall facts.
LLMs might behave like an associative memory model where certain tokens in the contexts serve as clues to retrieving facts.
- Score: 40.964584197528175
- License:
- Abstract: Large Language Models (LLMs) have the capacity to store and recall facts. Through experimentation with open-source models, we observe that this ability to retrieve facts can be easily manipulated by changing contexts, even without altering their factual meanings. These findings highlight that LLMs might behave like an associative memory model where certain tokens in the contexts serve as clues to retrieving facts. We mathematically explore this property by studying how transformers, the building blocks of LLMs, can complete such memory tasks. We study a simple latent concept association problem with a one-layer transformer and we show theoretically and empirically that the transformer gathers information using self-attention and uses the value matrix for associative memory.
Related papers
- Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture [6.144680854063938]
We introduce an associative memory model capable of performing in-context learning (ICL) in large language models (LLMs)
We use this as inspiration for a novel residual stream architecture which allows information to directly flow between attention heads.
We test this architecture during training within a two-layer Transformer and show its ICL abilities manifest more quickly than without this modification.
arXiv Detail & Related papers (2024-12-19T17:55:42Z) - Understanding Factual Recall in Transformers via Associative Memories [55.93756571457904]
We show that shallow transformers can use a combination of associative memories to obtain near optimal storage capacity.
We show that a transformer with a single layer of self-attention followed by an parameters can obtain 100% accuracy on a factual recall task.
arXiv Detail & Related papers (2024-12-09T14:48:14Z) - Scaling Laws for Fact Memorization of Large Language Models [67.94080978627363]
We analyze the scaling laws for Large Language Models' fact knowledge and their behaviors of memorizing different types of facts.
We find that LLMs' fact knowledge capacity has a linear and negative exponential law relationship with model size and training epochs.
Our findings reveal the capacity and characteristics of LLMs' fact knowledge learning, which provide directions for LLMs' fact knowledge augmentation.
arXiv Detail & Related papers (2024-06-22T03:32:09Z) - MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing large language models (LLMs) by integrating a structured and explicit read-and-write memory module.
Our experiments indicate that MemLLM enhances the LLM's performance and interpretability, in language modeling in general and knowledge-intensive tasks in particular.
arXiv Detail & Related papers (2024-04-17T18:13:16Z) - DoLa: Decoding by Contrasting Layers Improves Factuality in Large
Language Models [79.01926242857613]
Large language models (LLMs) are prone to hallucinations, generating content that deviates from facts seen during pretraining.
We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs.
We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts.
arXiv Detail & Related papers (2023-09-07T17:45:31Z) - Linearity of Relation Decoding in Transformer Language Models [82.47019600662874]
Much of the knowledge encoded in transformer language models (LMs) may be expressed in terms of relations.
We show that, for a subset of relations, this computation is well-approximated by a single linear transformation on the subject representation.
arXiv Detail & Related papers (2023-08-17T17:59:19Z) - ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory [29.822360561150475]
Large language models (LLMs) with memory are computationally universal.
We seek inspiration from modern computer architectures to augment LLMs with symbolic memory for complex multi-hop reasoning.
We validate the effectiveness of the proposed memory framework on a synthetic dataset requiring complex reasoning.
arXiv Detail & Related papers (2023-06-06T17:58:24Z) - Mention Memory: incorporating textual knowledge into Transformers
through entity mention attention [21.361822569279003]
We propose to integrate a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge.
The proposed model - TOME - is a Transformer that accesses the information through internal memory layers in which each entity mention in the input passage attends to the mention memory.
In experiments using a memory of 150 million Wikipedia mentions, TOME achieves strong performance on several open-domain knowledge-intensive tasks.
arXiv Detail & Related papers (2021-10-12T17:19:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.