Related papers: Neuron-Level Knowledge Attribution in Large Language Models

Neuron-Level Knowledge Attribution in Large Language Models

URL: http://arxiv.org/abs/2312.12141v3
Date: Sun, 9 Jun 2024 20:03:02 GMT
Title: Neuron-Level Knowledge Attribution in Large Language Models
Authors: Zeping Yu, Sophia Ananiadou,
Abstract summary: We propose a method for pinpointing significant neurons for different outputs. Our approach demonstrates superior performance across three metrics. We will release our data and code on storage and set the stage for future research in knowledge editing.
Score: 19.472889262384818
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Identifying important neurons for final predictions is essential for understanding the mechanisms of large language models. Due to computational constraints, current attribution techniques struggle to operate at neuron level. In this paper, we propose a static method for pinpointing significant neurons for different outputs. Compared to seven other methods, our approach demonstrates superior performance across three metrics. Additionally, since most static methods typically only identify "value neurons" directly contributing to the final prediction, we introduce a static method for identifying "query neurons" which activate these "value neurons". Finally, we apply our methods to analyze the localization of six distinct types of knowledge across both attention and feed-forward network (FFN) layers. Our method and analysis are helpful for understanding the mechanisms of knowledge storage and set the stage for future research in knowledge editing. We will release our data and code on github.

Related papers

Skeletonization of neuronal processes using Discrete Morse techniques from computational topology [3.9341278092649925]
We propose a new approach to map mesoscale neural circuitry in vertebrate brains.<n>It is better connected to the underlying neurons by skeletonizing labeled axon fragments and then estimating a volumetric length density.<n>This technique takes into account nonlocal connectivity information and therefore provides noise-robustness.
arXiv Detail & Related papers (2025-05-12T16:59:36Z)
Neuron Empirical Gradient: Discovering and Quantifying Neurons Global Linear Controllability [14.693407823048478]
Our study first investigates the numerical relationship between neuron activations and model output. We introduce NeurGrad, an accurate and efficient method for computing neuron empirical gradient (NEG)
arXiv Detail & Related papers (2024-12-24T00:01:24Z)
Growing Deep Neural Network Considering with Similarity between Neurons [4.32776344138537]
We explore a novel approach of progressively increasing neuron numbers in compact models during training phases. We propose a method that reduces feature extraction biases and neuronal redundancy by introducing constraints based on neuron similarity distributions. Results on CIFAR-10 and CIFAR-100 datasets demonstrated accuracy improvement.
arXiv Detail & Related papers (2024-08-23T11:16:37Z)
Linear Explanations for Individual Neurons [12.231741536057378]
We show that the highest activation range is only responsible for a very small percentage of the neuron's causal effect. In addition, inputs causing lower activations are often very different and can't be reliably predicted by only looking at high activations.
arXiv Detail & Related papers (2024-05-10T23:48:37Z)
Simple and Effective Transfer Learning for Neuro-Symbolic Integration [50.592338727912946]
A potential solution to this issue is Neuro-Symbolic Integration (NeSy), where neural approaches are combined with symbolic reasoning. Most of these methods exploit a neural network to map perceptions to symbols and a logical reasoner to predict the output of the downstream task. They suffer from several issues, including slow convergence, learning difficulties with complex perception tasks, and convergence to local minima. This paper proposes a simple yet effective method to ameliorate these problems.
arXiv Detail & Related papers (2024-02-21T15:51:01Z)
Hebbian Learning based Orthogonal Projection for Continual Learning of Spiking Neural Networks [74.3099028063756]
We develop a new method with neuronal operations based on lateral connections and Hebbian learning. We show that Hebbian and anti-Hebbian learning on recurrent lateral connections can effectively extract the principal subspace of neural activities. Our method consistently solves for spiking neural networks with nearly zero forgetting.
arXiv Detail & Related papers (2024-02-19T09:29:37Z)
Identifying Interpretable Visual Features in Artificial and Biological Neural Systems [3.604033202771937]
Single neurons in neural networks are often interpretable in that they represent individual, intuitively meaningful features. Many neurons exhibit $textitmixed selectivity$, i.e., they represent multiple unrelated features. We propose an automated method for quantifying visual interpretability and an approach for finding meaningful directions in network activation space.
arXiv Detail & Related papers (2023-10-17T17:41:28Z)
Neuron to Graph: Interpreting Language Model Neurons at Scale [8.32093320910416]
This paper introduces a novel automated approach designed to scale interpretability techniques across a vast array of neurons within Large Language Models. We propose Neuron to Graph (N2G), an innovative tool that automatically extracts a neuron's behaviour from the dataset it was trained on and translates it into an interpretable graph.
arXiv Detail & Related papers (2023-05-31T14:44:33Z)
Redundancy and Concept Analysis for Code-trained Language Models [5.726842555987591]
Code-trained language models have proven to be highly effective for various code intelligence tasks. They can be challenging to train and deploy for many software engineering applications due to computational bottlenecks and memory constraints. We perform the first neuron-level analysis for source code models to identify textitimportant neurons within latent representations.
arXiv Detail & Related papers (2023-05-01T15:22:41Z)
Constraints on the design of neuromorphic circuits set by the properties of neural population codes [61.15277741147157]
In the brain, information is encoded, transmitted and used to inform behaviour. Neuromorphic circuits need to encode information in a way compatible to that used by populations of neuron in the brain.
arXiv Detail & Related papers (2022-12-08T15:16:04Z)
Neuro-Symbolic Learning of Answer Set Programs from Raw Data [54.56905063752427]
Neuro-Symbolic AI aims to combine interpretability of symbolic techniques with the ability of deep learning to learn from raw data. We introduce Neuro-Symbolic Inductive Learner (NSIL), an approach that trains a general neural network to extract latent concepts from raw data. NSIL learns expressive knowledge, solves computationally complex problems, and achieves state-of-the-art performance in terms of accuracy and data efficiency.
arXiv Detail & Related papers (2022-05-25T12:41:59Z)
Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks. We explore the diversity of the neurons within the hidden layer during the learning process. We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z)
Neuron-based explanations of neural networks sacrifice completeness and interpretability [67.53271920386851]
We show that for AlexNet pretrained on ImageNet, neuron-based explanation methods sacrifice both completeness and interpretability. We show the most important principal components provide more complete and interpretable explanations than the most important neurons. Our findings suggest that explanation methods for networks like AlexNet should avoid using neurons as a basis for embeddings.
arXiv Detail & Related papers (2020-11-05T21:26:03Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.