Understanding and Controlling Repetition Neurons and Induction Heads in In-Context Learning
- URL: http://arxiv.org/abs/2507.07810v1
- Date: Thu, 10 Jul 2025 14:40:31 GMT
- Title: Understanding and Controlling Repetition Neurons and Induction Heads in In-Context Learning
- Authors: Nhi Hoai Doan, Tatsuya Hiraoka, Kentaro Inui,
- Abstract summary: This paper investigates the relationship between large language models' (LLMs) ability to recognize repetitive input patterns and their performance on in-context learning (ICL)<n>Our experiments reveal that the impact of repetition neurons on ICL performance varies depending on the depth of the layer in which they reside.
- Score: 22.627302782393865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the relationship between large language models' (LLMs) ability to recognize repetitive input patterns and their performance on in-context learning (ICL). In contrast to prior work that has primarily focused on attention heads, we examine this relationship from the perspective of skill neurons, specifically repetition neurons. Our experiments reveal that the impact of these neurons on ICL performance varies depending on the depth of the layer in which they reside. By comparing the effects of repetition neurons and induction heads, we further identify strategies for reducing repetitive outputs while maintaining strong ICL capabilities.
Related papers
- Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning [25.277730616459255]
Deep reinforcement learning (RL) agents frequently suffer from neuronal activity loss.<n>GraMa (Gradient Magnitude Neural Activity Metric) is a metric for quantifying neuron-level learning capacity.<n>We show that GraMa effectively reveals persistent neuron inactivity across diverse architectures.
arXiv Detail & Related papers (2025-05-29T23:07:58Z) - Understanding Gated Neurons in Transformers from Their Input-Output Functionality [48.91500104957796]
We look at the cosine similarity between input and output weights of a neuron.<n>We find that enrichment neurons dominate in early-middle layers whereas later layers tend more towards depletion.
arXiv Detail & Related papers (2025-05-23T14:14:17Z) - Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities.<n>We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities.<n>We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z) - Neuron-Level Differentiation of Memorization and Generalization in Large Language Models [9.504942958632384]
We investigate how Large Language Models distinguish between memorization and generalization at the neuron level.<n>Experiments on both a GPT-2 model trained from scratch and a pretrained LLaMA-3.2 model fine-tuned with LoRA show consistent neuron-level specialization.
arXiv Detail & Related papers (2024-12-24T15:28:56Z) - Brain-like Functional Organization within Large Language Models [58.93629121400745]
The human brain has long inspired the pursuit of artificial intelligence (AI)
Recent neuroimaging studies provide compelling evidence of alignment between the computational representation of artificial neural networks (ANNs) and the neural responses of the human brain to stimuli.
In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs)
This framework links the AN sub-groups to FBNs, enabling the delineation of brain-like functional organization within large language models (LLMs)
arXiv Detail & Related papers (2024-10-25T13:15:17Z) - Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons [45.04661608619081]
We detect task-sensitive neurons in large language models (LLMs) via gradient attribution on task-specific data.<n>We find that the overlap of task-specific neurons is strongly associated with generalization and specialization across tasks.<n>We propose a neuron-level continuous fine-tuning method that only fine-tunes the current task-specific neurons during continuous learning.
arXiv Detail & Related papers (2024-07-09T01:27:35Z) - Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs [70.3132264719438]
We aim to fill the research gap by examining how neuron activation is shared across tasks and languages.
We classify neurons into four distinct categories based on their responses to a specific input across different languages.
Our analysis reveals the following insights: (i) the patterns of neuron sharing are significantly affected by the characteristics of tasks and examples; (ii) neuron sharing does not fully correspond with language similarity; (iii) shared neurons play a vital role in generating responses, especially those shared across all languages.
arXiv Detail & Related papers (2024-06-13T16:04:11Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - The Dormant Neuron Phenomenon in Deep Reinforcement Learning [26.09145694804957]
We identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons.
We propose a simple and effective method (ReDo) that Recycles Dormant neurons throughout training.
Our experiments demonstrate that ReDo maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance.
arXiv Detail & Related papers (2023-02-24T21:20:18Z) - Modeling Associative Plasticity between Synapses to Enhance Learning of
Spiking Neural Networks [4.736525128377909]
Spiking Neural Networks (SNNs) are the third generation of artificial neural networks that enable energy-efficient implementation on neuromorphic hardware.
We propose a robust and effective learning mechanism by modeling the associative plasticity between synapses.
Our approaches achieve superior performance on static and state-of-the-art neuromorphic datasets.
arXiv Detail & Related papers (2022-07-24T06:12:23Z) - Neural Language Models are not Born Equal to Fit Brain Data, but
Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook.
We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words.
We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.