Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities
- URL: http://arxiv.org/abs/2505.21191v1
- Date: Tue, 27 May 2025 13:40:28 GMT
- Title: Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities
- Authors: Junyan Zhang, Yubo Gao, Yibo Yan, Jungang Li, Zhaorui Hou, Sicheng Tao, Shuliang Liu, Song Dai, Yonghua Hei, Junzhuo Li, Xuming Hu,
- Abstract summary: Finetuning of Large Language Models (LLMs) has significantly advanced their instruction-following capabilities.<n>This study examines how fine-tuning reconfigures LLM computations by isolating and analyzing instruction-specific sparse components.
- Score: 12.065600268467556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The finetuning of Large Language Models (LLMs) has significantly advanced their instruction-following capabilities, yet the underlying computational mechanisms driving these improvements remain poorly understood. This study systematically examines how fine-tuning reconfigures LLM computations by isolating and analyzing instruction-specific sparse components, i.e., neurons in dense models and both neurons and experts in Mixture-of-Experts (MoE) architectures. In particular, we introduce HexaInst, a carefully curated and balanced instructional dataset spanning six distinct categories, and propose SPARCOM, a novel analytical framework comprising three key contributions: (1) a method for identifying these sparse components, (2) an evaluation of their functional generality and uniqueness, and (3) a systematic comparison of their alterations. Through experiments, we demonstrate functional generality, uniqueness, and the critical role of these components in instruction execution. By elucidating the relationship between fine-tuning-induced adaptations and sparse computational substrates, this work provides deeper insights into how LLMs internalize instruction-following behavior for the trustworthy LLM community.
Related papers
- Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning [12.054910727620154]
Eye-tracking data reveals valuable insights into users' cognitive states but is difficult to analyze due to its structured, non-linguistic nature.<n>This paper presents a multimodal human-AI collaborative framework designed to enhance cognitive pattern extraction from eye-tracking signals.
arXiv Detail & Related papers (2025-07-24T09:49:53Z) - Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders [0.0]
We present a novel SAE-based architecture tailored for text classification.<n>We benchmark this architecture against established methods such as ConceptShap, Independent Component Analysis, and other SAE-based concept extraction techniques.<n>Our empirical results show that our architecture improves both the causality and interpretability of the extracted features.
arXiv Detail & Related papers (2025-06-30T15:18:50Z) - How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks.<n>We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z) - Comparative Analysis of Pooling Mechanisms in LLMs: A Sentiment Analysis Perspective [0.0]
Transformer-based models like BERT and GPT rely on pooling layers to aggregate token-level embeddings into sentence-level representations.<n>Common pooling mechanisms such as Mean, Max, and Weighted Sum play a pivotal role in this aggregation process.<n>This paper investigates the effects of these pooling mechanisms on two prominent LLM families -- BERT and GPT, in the context of sentence-level sentiment analysis.
arXiv Detail & Related papers (2024-11-22T00:59:25Z) - Understanding Ranking LLMs: A Mechanistic Analysis for Information Retrieval [20.353393773305672]
We employ a probing-based analysis to examine neuron activations in ranking LLMs.<n>Our study spans a broad range of feature categories, including lexical signals, document structure, query-document interactions, and complex semantic representations.<n>Our findings offer crucial insights for developing more transparent and reliable retrieval systems.
arXiv Detail & Related papers (2024-10-24T08:20:10Z) - Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs [64.9693406713216]
Internal mechanisms that contribute to the effectiveness of RAG systems remain underexplored.
Our experiments reveal that several core groups of experts are primarily responsible for RAG-related behaviors.
We propose several strategies to enhance RAG's efficiency and effectiveness through expert activation.
arXiv Detail & Related papers (2024-10-20T16:08:54Z) - X2-DFD: A framework for eXplainable and eXtendable Deepfake Detection [55.77552681618732]
X2-DFD is an eXplainable and eXtendable framework based on multimodal large-language models (MLLMs) for deepfake detection.<n>The first stage, Model Feature Assessment, systematically evaluates the detectability of forgery-related features for the MLLM.<n>The second stage, Explainable dataset Construction, consists of two key modules: Strong Feature Strengthening and Weak Feature Supplementing.<n>The third stage, Fine-tuning and Inference, involves fine-tuning the MLLM on the constructed dataset and deploying it for final detection and explanation.
arXiv Detail & Related papers (2024-10-08T15:28:33Z) - Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient [0.49478969093606673]
We introduce refined variants of the Local Learning Coefficient (LLC), a measure of model complexity grounded in singular learning theory.
We study the development of internal structure in transformer language models during training.
arXiv Detail & Related papers (2024-10-03T20:51:02Z) - Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications.
In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations.
We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z) - CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.
Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance.
In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
using Causal Mediation Analysis [128.0532113800092]
We present a mechanistic interpretation of Transformer-based LMs on arithmetic questions.
This provides insights into how information related to arithmetic is processed by LMs.
arXiv Detail & Related papers (2023-05-24T11:43:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.