Related papers: Auxiliary Metrics Help Decoding Skill Neurons in the Wild

Auxiliary Metrics Help Decoding Skill Neurons in the Wild

URL: http://arxiv.org/abs/2511.21610v1
Date: Wed, 26 Nov 2025 17:31:53 GMT
Title: Auxiliary Metrics Help Decoding Skill Neurons in the Wild
Authors: Yixiu Zhao, Xiaozhi Wang, Zijun Yao, Lei Hou, Juanzi Li,
Abstract summary: We introduce a simple, lightweight, and broadly applicable method for isolating neurons that encode specific skills.<n>We correlate neuron activations with auxiliary metrics, such as external labels and the model's own confidence score.<n>We empirically validate our method on tasks spanning open-ended text generation and natural language inference.
Score: 52.148049490080496
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) exhibit remarkable capabilities across a wide range of tasks, yet their internal mechanisms remain largely opaque. In this paper, we introduce a simple, lightweight, and broadly applicable method with a focus on isolating neurons that encode specific skills. Building upon prior work that identified "skill neurons" via soft prompt training on classification tasks, our approach extends the analysis to complex scenarios involving multiple skills. We correlate neuron activations with auxiliary metrics -- such as external labels and the model's own confidence score -- thereby uncovering interpretable and task-specific behaviors without the need for manual token aggregation. We empirically validate our method on tasks spanning open-ended text generation and natural language inference, demonstrating its ability to detect neurons that not only drive known skills but also reveal previously unidentified shortcuts in arithmetic reasoning on BigBench.

Related papers

Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons [45.04661608619081]
We detect task-sensitive neurons in large language models (LLMs) via gradient attribution on task-specific data.<n>We find that the overlap of task-specific neurons is strongly associated with generalization and specialization across tasks.<n>We propose a neuron-level continuous fine-tuning method that only fine-tunes the current task-specific neurons during continuous learning.
arXiv Detail & Related papers (2024-07-09T01:27:35Z)
tagE: Enabling an Embodied Agent to Understand Human Instructions [3.943519623674811]
We introduce a novel system known as task and argument grounding for Embodied agents (tagE) At its core, our system employs an inventive neural network model designed to extract a series of tasks from complex task instructions expressed in natural language. Our proposed model adopts an encoder-decoder framework enriched with nested decoding to effectively extract tasks and their corresponding arguments from these intricate instructions.
arXiv Detail & Related papers (2023-10-24T08:17:48Z)
Automated Natural Language Explanation of Deep Visual Neurons with Large Models [43.178568768100305]
This paper proposes a novel post-hoc framework for generating semantic explanations of neurons with large foundation models. Our framework is designed to be compatible with various model architectures and datasets, automated and scalable neuron interpretation.
arXiv Detail & Related papers (2023-10-16T17:04:51Z)
DISCOVER: Making Vision Networks Interpretable via Competition and Dissection [11.028520416752325]
This work contributes to post-hoc interpretability, and specifically Network Dissection. Our goal is to present a framework that makes it easier to discover the individual functionality of each neuron in a network trained on a vision task.
arXiv Detail & Related papers (2023-10-07T21:57:23Z)
Redundancy and Concept Analysis for Code-trained Language Models [5.726842555987591]
Code-trained language models have proven to be highly effective for various code intelligence tasks. They can be challenging to train and deploy for many software engineering applications due to computational bottlenecks and memory constraints. We perform the first neuron-level analysis for source code models to identify textitimportant neurons within latent representations.
arXiv Detail & Related papers (2023-05-01T15:22:41Z)
Learning Options via Compression [62.55893046218824]
We propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills. Our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood.
arXiv Detail & Related papers (2022-12-08T22:34:59Z)
Measures of Information Reflect Memorization Patterns [53.71420125627608]
We show that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization. Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples.
arXiv Detail & Related papers (2022-10-17T20:15:24Z)
Multi-Task Neural Processes [105.22406384964144]
We develop multi-task neural processes, a new variant of neural processes for multi-task learning. In particular, we propose to explore transferable knowledge from related tasks in the function space to provide inductive bias for improving each individual task. Results demonstrate the effectiveness of multi-task neural processes in transferring useful knowledge among tasks for multi-task learning.
arXiv Detail & Related papers (2021-11-10T17:27:46Z)
Self-training with Few-shot Rationalization: Teacher Explanations Aid Student in Few-shot NLU [88.8401599172922]
We develop a framework based on self-training language models with limited task-specific labels and rationales. We show that the neural model performance can be significantly improved by making it aware of its rationalized predictions.
arXiv Detail & Related papers (2021-09-17T00:36:46Z)
Towards Efficient Processing and Learning with Spikes: New Approaches for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks. In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented. Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.