Related papers: Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture

Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture

URL: http://arxiv.org/abs/2412.15113v1
Date: Thu, 19 Dec 2024 17:55:42 GMT
Title: Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture
Authors: Thomas F Burns, Tomoki Fukai, Christopher J Earls,
Abstract summary: We introduce an associative memory model capable of performing in-context learning (ICL) in large language models (LLMs)<n>We use this as inspiration for a novel residual stream architecture which allows information to directly flow between attention heads.<n>We test this architecture during training within a two-layer Transformer and show its ICL abilities manifest more quickly than without this modification.
Score: 6.144680854063938
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) demonstrate an impressive ability to utilise information within the context of their input sequences to appropriately respond to data unseen by the LLM during its training procedure. This ability is known as in-context learning (ICL). Humans and non-human animals demonstrate similar abilities, however their neural architectures differ substantially from LLMs. Despite this, a critical component within LLMs, the attention mechanism, resembles modern associative memory models, widely used in and influenced by the computational neuroscience community to model biological memory systems. Using this connection, we introduce an associative memory model capable of performing ICL. We use this as inspiration for a novel residual stream architecture which allows information to directly flow between attention heads. We test this architecture during training within a two-layer Transformer and show its ICL abilities manifest more quickly than without this modification. We then apply our architecture in small language models with 8 million parameters, focusing on attention head values, with results also indicating improved ICL performance at this larger and more naturalistic scale.

Related papers

Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z)
Hebbian Memory-Augmented Recurrent Networks: Engram Neurons in Deep Learning [0.0]
We introduce the Engram Neural Network (ENN), a novel recurrent architecture incorporating an explicit, differentiable memory matrix with Hebbian plasticity and sparse, attention-driven retrieval mechanisms.<n>The ENN explicitly models memory formation and recall through dynamic Hebbian traces, improving transparency and interpretability compared to conventional RNN variants.
arXiv Detail & Related papers (2025-07-29T03:34:32Z)
Retrospective Memory for Camouflaged Object Detection [18.604039107883317]
We propose a recall-augmented COD architecture, namely RetroMem, which dynamically modulates camouflage pattern perception and inference.<n>In the recall stage, we propose a dynamic memory mechanism and an inference pattern reconstruction.<n>Our RetroMem significantly outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-06-18T08:22:19Z)
Long Term Memory: The Foundation of AI Self-Evolution [48.52678410533424]
Large language models (LLMs) like GPTs, trained on vast datasets, have demonstrated impressive capabilities in language understanding, reasoning, and planning. Most studies focus on enhancing these models by training on ever-larger datasets to build more powerful foundation models. Unlike large-scale training, enabling models to evolve during inference is equally crucial, a process we refer to as AI self-evolution.
arXiv Detail & Related papers (2024-10-21T06:09:30Z)
Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network [16.317199232071232]
Large Language Models (LLMs) have been shown to be effective models of the human language system. In this work, we investigate the key architectural components driving the surprising alignment of untrained models.
arXiv Detail & Related papers (2024-06-21T12:54:03Z)
CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory [38.429707659685974]
Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs. We introduce an associative memory module which can be coupled to any pre-trained (frozen) attention-based LLM without re-training. This architecture, which we call CAMELoT, demonstrates superior performance even with a tiny context window of 128 tokens.
arXiv Detail & Related papers (2024-02-21T01:00:17Z)
Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain [12.92793034617015]
We show that as large language models (LLMs) achieve higher performance on benchmark tasks, they become more brain-like. We also show the importance of contextual information in improving model performance and brain similarity.
arXiv Detail & Related papers (2024-01-31T08:48:35Z)
In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL) We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z)
In-context Autoencoder for Context Compression in a Large Language Model [70.7621953091318]
We propose the In-context Autoencoder (ICAE) to compress a long context into short compact memory slots. ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data.
arXiv Detail & Related papers (2023-07-13T17:59:21Z)
CogNGen: Constructing the Kernel of a Hyperdimensional Predictive Processing Cognitive Architecture [79.07468367923619]
We present a new cognitive architecture that combines two neurobiologically plausible, computational models. We aim to develop a cognitive architecture that has the power of modern machine learning techniques.
arXiv Detail & Related papers (2022-03-31T04:44:28Z)
GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture. We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions. We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
Learning Associative Inference Using Fast Weight Memory [12.239487954915646]
We augment the LSTM model with an associative memory, dubbed Fast Weight Memory (FWM) Our model is trained end-to-end by gradient descent and yields excellent performance on compositional language reasoning problems.
arXiv Detail & Related papers (2020-11-16T10:01:23Z)
Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. We show how to extend the architecture of a simple RNN by separating its hidden state into different modules. We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.