Related papers: Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions

Related papers

Investigating The Functional Roles of Attention Heads in Vision Language Models: Evidence for Reasoning Modules [76.21320451720764]
We introduce CogVision, a dataset that decomposes complex multimodal questions into step-by-step subquestions.<n>Using a probing-based methodology, we identify attention heads that specialize in these functions and characterize them as functional heads.<n>Our analysis reveals that these functional heads are universally sparse, vary in number and distribution across functions, and mediate interactions and hierarchical organization.
arXiv Detail & Related papers (2025-12-11T05:42:53Z)
Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning [54.12174882424842]
Large language models (LLMs) have achieved state-of-the-art performance in a variety of tasks, but remain largely opaque in terms of their internal mechanisms.<n>We propose a novel interpretability framework to systematically analyze the roles and behaviors of attention heads.
arXiv Detail & Related papers (2025-12-03T10:24:34Z)
Auxiliary Metrics Help Decoding Skill Neurons in the Wild [52.148049490080496]
We introduce a simple, lightweight, and broadly applicable method for isolating neurons that encode specific skills.<n>We correlate neuron activations with auxiliary metrics, such as external labels and the model's own confidence score.<n>We empirically validate our method on tasks spanning open-ended text generation and natural language inference.
arXiv Detail & Related papers (2025-11-26T17:31:53Z)
Understanding and Controlling Repetition Neurons and Induction Heads in In-Context Learning [22.627302782393865]
This paper investigates the relationship between large language models' (LLMs) ability to recognize repetitive input patterns and their performance on in-context learning (ICL)<n>Our experiments reveal that the impact of repetition neurons on ICL performance varies depending on the depth of the layer in which they reside.
arXiv Detail & Related papers (2025-07-10T14:40:31Z)
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations [1.0485739694839669]
Large language models (LLMs) can sometimes report the strategies they actually use to solve tasks, but they can also fail to do so.<n>This suggests some degree of metacognition -- the capacity to monitor one's own cognitive processes for subsequent reporting and self-control.<n>We introduce a neuroscience-inspired neurofeedback paradigm designed to quantify the ability of LLMs to explicitly report and control their activation patterns.
arXiv Detail & Related papers (2025-05-19T22:32:25Z)
Meta-Representational Predictive Coding: Biomimetic Self-Supervised Learning [51.22185316175418]
We present a new form of predictive coding that we call meta-representational predictive coding (MPC) MPC sidesteps the need for learning a generative model of sensory input by learning to predict representations of sensory input across parallel streams.
arXiv Detail & Related papers (2025-03-22T22:13:14Z)
Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models [53.91412558475662]
We use methods similar to those in the field of functional neuroimaging analysis to locate and identify functional networks in large language models (LLMs) Experimental results show that, similar to the human brain, LLMs contain functional networks that frequently recur during operation. Masking key functional networks significantly impairs the model's performance, while retaining just a subset is adequate to maintain effective operation.
arXiv Detail & Related papers (2025-02-13T04:42:39Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs [8.861378619584093]
Large language models (LLMs) have shown strong arithmetic reasoning capabilities when prompted with Chain-of-Thought prompts. We investigate neuron activation'' as a lens to provide a unified explanation to observations made by prior work.
arXiv Detail & Related papers (2024-06-18T05:49:24Z)
Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs [70.3132264719438]
We aim to fill the research gap by examining how neuron activation is shared across tasks and languages. We classify neurons into four distinct categories based on their responses to a specific input across different languages. Our analysis reveals the following insights: (i) the patterns of neuron sharing are significantly affected by the characteristics of tasks and examples; (ii) neuron sharing does not fully correspond with language similarity; (iii) shared neurons play a vital role in generating responses, especially those shared across all languages.
arXiv Detail & Related papers (2024-06-13T16:04:11Z)
Linking In-context Learning in Transformers to Human Episodic Memory [1.124958340749622]
We focus on induction heads, which contribute to in-context learning in Transformer-based large language models. We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval model of human episodic memory.
arXiv Detail & Related papers (2024-05-23T18:51:47Z)
Identifying Semantic Induction Heads to Understand In-Context Learning [103.00463655766066]
We investigate whether attention heads encode two types of relationships between tokens present in natural languages. We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens.
arXiv Detail & Related papers (2024-02-20T14:43:39Z)
Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain [12.92793034617015]
We show that as large language models (LLMs) achieve higher performance on benchmark tasks, they become more brain-like. We also show the importance of contextual information in improving model performance and brain similarity.
arXiv Detail & Related papers (2024-01-31T08:48:35Z)
Reliability Analysis of Psychological Concept Extraction and Classification in User-penned Text [9.26840677406494]
We use the LoST dataset to capture nuanced textual cues that suggest the presence of low self-esteem in the posts of Reddit users. Our findings suggest the need of shifting the focus of PLMs from Trigger and Consequences to a more comprehensive explanation.
arXiv Detail & Related papers (2024-01-12T17:19:14Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
Learning Theory of Mind via Dynamic Traits Attribution [59.9781556714202]
We propose a new neural ToM architecture that learns to generate a latent trait vector of an actor from the past trajectories. This trait vector then multiplicatively modulates the prediction mechanism via a fast weights' scheme in the prediction neural network. We empirically show that the fast weights provide a good inductive bias to model the character traits of agents and hence improves mindreading ability.
arXiv Detail & Related papers (2022-04-17T11:21:18Z)
Overcoming the Domain Gap in Contrastive Learning of Neural Action Representations [60.47807856873544]
A fundamental goal in neuroscience is to understand the relationship between neural activity and behavior. We generated a new multimodal dataset consisting of the spontaneous behaviors generated by fruit flies. This dataset and our new set of augmentations promise to accelerate the application of self-supervised learning methods in neuroscience.
arXiv Detail & Related papers (2021-11-29T15:27:51Z)
CogAlign: Learning to Align Textual Neural Representations to Cognitive Language Processing Signals [60.921888445317705]
We propose a CogAlign approach to integrate cognitive language processing signals into natural language processing models. We show that CogAlign achieves significant improvements with multiple cognitive features over state-of-the-art models on public datasets.
arXiv Detail & Related papers (2021-06-10T07:10:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.