Related papers: Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

URL: http://arxiv.org/abs/2406.18400v2
Date: Wed, 27 Nov 2024 22:41:03 GMT
Title: Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Authors: Yibo Jiang, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam,
Abstract summary: Large Language Models (LLMs) have the capacity to store and recall facts.<n>LLMs might behave like an associative memory model where certain tokens in the contexts serve as clues to retrieving facts.
Score: 40.964584197528175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have the capacity to store and recall facts. Through experimentation with open-source models, we observe that this ability to retrieve facts can be easily manipulated by changing contexts, even without altering their factual meanings. These findings highlight that LLMs might behave like an associative memory model where certain tokens in the contexts serve as clues to retrieving facts. We mathematically explore this property by studying how transformers, the building blocks of LLMs, can complete such memory tasks. We study a simple latent concept association problem with a one-layer transformer and we show theoretically and empirically that the transformer gathers information using self-attention and uses the value matrix for associative memory.

Related papers

Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture [6.144680854063938]
We introduce an associative memory model capable of performing in-context learning (ICL) in large language models (LLMs) We use this as inspiration for a novel residual stream architecture which allows information to directly flow between attention heads. We test this architecture during training within a two-layer Transformer and show its ICL abilities manifest more quickly than without this modification.
arXiv Detail & Related papers (2024-12-19T17:55:42Z)
Understanding Factual Recall in Transformers via Associative Memories [55.93756571457904]
We show that shallow transformers can use a combination of associative memories to obtain near optimal storage capacity. We show that a transformer with a single layer of self-attention followed by an parameters can obtain 100% accuracy on a factual recall task.
arXiv Detail & Related papers (2024-12-09T14:48:14Z)
Scaling Laws for Fact Memorization of Large Language Models [67.94080978627363]
We analyze the scaling laws for Large Language Models' fact knowledge and their behaviors of memorizing different types of facts. We find that LLMs' fact knowledge capacity has a linear and negative exponential law relationship with model size and training epochs. Our findings reveal the capacity and characteristics of LLMs' fact knowledge learning, which provide directions for LLMs' fact knowledge augmentation.
arXiv Detail & Related papers (2024-06-22T03:32:09Z)
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module. Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular. We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z)
Do LLMs Dream of Ontologies? [15.049502693786698]
Large language models (LLMs) have recently revolutionized automated text understanding and generation. This paper investigates whether and to what extent general-purpose pre-trained LLMs have information from known.
arXiv Detail & Related papers (2024-01-26T15:10:23Z)
Do Large Language Models Know about Facts? [60.501902866946]
Large language models (LLMs) have recently driven striking performance improvements across a range of natural language processing tasks. We aim to evaluate the extent and scope of factual knowledge within LLMs by designing the benchmark Pinocchio. Pinocchio contains 20K diverse factual questions that span different sources, timelines, domains, regions, and languages.
arXiv Detail & Related papers (2023-10-08T14:26:55Z)
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models [79.01926242857613]
Large language models (LLMs) are prone to hallucinations, generating content that deviates from facts seen during pretraining. We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs. We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts.
arXiv Detail & Related papers (2023-09-07T17:45:31Z)
Linearity of Relation Decoding in Transformer Language Models [82.47019600662874]
Much of the knowledge encoded in transformer language models (LMs) may be expressed in terms of relations. We show that, for a subset of relations, this computation is well-approximated by a single linear transformation on the subject representation.
arXiv Detail & Related papers (2023-08-17T17:59:19Z)
ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory [29.822360561150475]
Large language models (LLMs) with memory are computationally universal. We seek inspiration from modern computer architectures to augment LLMs with symbolic memory for complex multi-hop reasoning. We validate the effectiveness of the proposed memory framework on a synthetic dataset requiring complex reasoning.
arXiv Detail & Related papers (2023-06-06T17:58:24Z)
Mention Memory: incorporating textual knowledge into Transformers through entity mention attention [21.361822569279003]
We propose to integrate a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge. The proposed model - TOME - is a Transformer that accesses the information through internal memory layers in which each entity mention in the input passage attends to the mention memory. In experiments using a memory of 150 million Wikipedia mentions, TOME achieves strong performance on several open-domain knowledge-intensive tasks.
arXiv Detail & Related papers (2021-10-12T17:19:05Z)
Memory Transformer [0.31406146587437894]
Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. Memory-augmented neural networks (MANNs) extend traditional neural architectures with general-purpose memory for representations. We evaluate these memory augmented Transformers and demonstrate that presence of memory positively correlates with the model performance.
arXiv Detail & Related papers (2020-06-20T09:06:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.