Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
- URL: http://arxiv.org/abs/2409.14144v1
- Date: Sat, 21 Sep 2024 13:46:54 GMT
- Title: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
- Authors: Zeping Yu, Sophia Ananiadou,
- Abstract summary: We find arithmetic ability resides within a limited number of attention heads, with each head specializing in distinct operations.
We introduce the Comparative Neuron Analysis (CNA) method, which identifies an internal logic chain consisting of four distinct stages from input to prediction.
- Score: 19.472889262384818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We find arithmetic ability resides within a limited number of attention heads, with each head specializing in distinct operations. To delve into the reason, we introduce the Comparative Neuron Analysis (CNA) method, which identifies an internal logic chain consisting of four distinct stages from input to prediction: feature enhancing with shallow FFN neurons, feature transferring by shallow attention layers, feature predicting by arithmetic heads, and prediction enhancing among deep FFN neurons. Moreover, we identify the human-interpretable FFN neurons within both feature-enhancing and feature-predicting stages. These findings lead us to investigate the mechanism of LoRA, revealing that it enhances prediction probabilities by amplifying the coefficient scores of FFN neurons related to predictions. Finally, we apply our method in model pruning for arithmetic tasks and model editing for reducing gender bias. Code is on https://github.com/zepingyu0512/arithmetic-mechanism.
Related papers
- POCO: Scalable Neural Forecasting through Population Conditioning [4.781680085499199]
POCO is a unified neural forecasting model that captures both neuron-specific and brain-wide dynamics.<n>Trained across five calcium imaging datasets spanning zebrafish, mice, and C. elegans, POCO achieves state-of-the-art accuracy at cellular resolution in spontaneous behaviors.
arXiv Detail & Related papers (2025-06-17T20:15:04Z) - NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models [68.89389652724378]
NOBLE is a neural operator framework that learns a mapping from a continuous frequency-modulated embedding of interpretable neuron features to the somatic voltage response induced by current injection.<n>It predicts distributions of neural dynamics accounting for the intrinsic experimental variability.<n>NOBLE is the first scaled-up deep learning framework validated on real experimental data.
arXiv Detail & Related papers (2025-06-05T01:01:18Z) - Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution [16.460751105639623]
We introduce NeuronLens, a novel range-based interpretation and manipulation framework.
It provides a finer view of neuron activation distributions to localize concept attribution within a neuron.
arXiv Detail & Related papers (2025-02-04T03:33:55Z) - Understanding Artificial Neural Network's Behavior from Neuron Activation Perspective [8.251799609350725]
This paper explores the intricate behavior of deep neural networks (DNNs) through the lens of neuron activation dynamics.
We propose a probabilistic framework that can analyze models' neuron activation patterns as a process.
arXiv Detail & Related papers (2024-12-24T01:01:06Z) - QuantFormer: Learning to Quantize for Neural Activity Forecasting in Mouse Visual Cortex [26.499583552980248]
QuantFormer is a transformer-based model specifically designed for forecasting neural activity from two-photon calcium imaging data.
QuantFormer sets a new benchmark in forecasting mouse visual cortex activity.
It demonstrates robust performance and generalization across various stimuli and individuals.
arXiv Detail & Related papers (2024-12-10T07:44:35Z) - Exploring Behavior-Relevant and Disentangled Neural Dynamics with Generative Diffusion Models [2.600709013150986]
Understanding the neural basis of behavior is a fundamental goal in neuroscience.
Our approach, named BeNeDiff'', first identifies a fine-grained and disentangled neural subspace.
It then employs state-of-the-art generative diffusion models to synthesize behavior videos that interpret the neural dynamics of each latent factor.
arXiv Detail & Related papers (2024-10-12T18:28:56Z) - Growing Deep Neural Network Considering with Similarity between Neurons [4.32776344138537]
We explore a novel approach of progressively increasing neuron numbers in compact models during training phases.
We propose a method that reduces feature extraction biases and neuronal redundancy by introducing constraints based on neuron similarity distributions.
Results on CIFAR-10 and CIFAR-100 datasets demonstrated accuracy improvement.
arXiv Detail & Related papers (2024-08-23T11:16:37Z) - Automated Natural Language Explanation of Deep Visual Neurons with Large
Models [43.178568768100305]
This paper proposes a novel post-hoc framework for generating semantic explanations of neurons with large foundation models.
Our framework is designed to be compatible with various model architectures and datasets, automated and scalable neuron interpretation.
arXiv Detail & Related papers (2023-10-16T17:04:51Z) - NeuroExplainer: Fine-Grained Attention Decoding to Uncover Cortical
Development Patterns of Preterm Infants [73.85768093666582]
We propose an explainable geometric deep network dubbed NeuroExplainer.
NeuroExplainer is used to uncover altered infant cortical development patterns associated with preterm birth.
arXiv Detail & Related papers (2023-01-01T12:48:12Z) - A Novel Supervised Contrastive Regression Framework for Prediction of
Neurocognitive Measures Using Multi-Site Harmonized Diffusion MRI
Tractography [13.80649748804573]
Supervised Contrastive Regression (SCR) is a simple yet effective method that allows full supervision for contrastive learning in regression tasks.
SCR performs supervised contrastive representation learning by using the absolute difference between continuous regression labels.
SCR improves the accuracy of neurocognitive score prediction compared to other state-of-the-art methods.
arXiv Detail & Related papers (2022-10-13T23:24:12Z) - Neuronal Correlation: a Central Concept in Neural Network [22.764342635264452]
We show that neuronal correlation can be efficiently estimated via weight matrix.
We show that neuronal correlation significantly impacts on the accuracy of entropy estimation in high-dimensional hidden spaces.
arXiv Detail & Related papers (2022-01-22T15:01:50Z) - Overcoming the Domain Gap in Contrastive Learning of Neural Action
Representations [60.47807856873544]
A fundamental goal in neuroscience is to understand the relationship between neural activity and behavior.
We generated a new multimodal dataset consisting of the spontaneous behaviors generated by fruit flies.
This dataset and our new set of augmentations promise to accelerate the application of self-supervised learning methods in neuroscience.
arXiv Detail & Related papers (2021-11-29T15:27:51Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain.
In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z) - Neuro-symbolic Neurodegenerative Disease Modeling as Probabilistic
Programmed Deep Kernels [93.58854458951431]
We present a probabilistic programmed deep kernel learning approach to personalized, predictive modeling of neurodegenerative diseases.
Our analysis considers a spectrum of neural and symbolic machine learning approaches.
We run evaluations on the problem of Alzheimer's disease prediction, yielding results that surpass deep learning.
arXiv Detail & Related papers (2020-09-16T15:16:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.