Related papers: On Relation-Specific Neurons in Large Language Models

On Relation-Specific Neurons in Large Language Models

URL: http://arxiv.org/abs/2502.17355v1
Date: Mon, 24 Feb 2025 17:33:18 GMT
Title: On Relation-Specific Neurons in Large Language Models
Authors: Yihong Liu, Runsheng Chen, Lea Hirlimann, Ahmad Dawar Hakimi, Mingyang Wang, Amir Hossein Kargaran, Sascha Rothe, François Yvon, Hinrich Schütze,
Abstract summary: We study the Llama-2 family on a chosen set of relations with a statistics-based method.<n>Our experiments demonstrate the existence of relation-specific neurons.<n>We give evidence for the following three properties of relation-specific neurons.
Score: 47.33286549564444
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In large language models (LLMs), certain neurons can store distinct pieces of knowledge learned during pretraining. While knowledge typically appears as a combination of relations and entities, it remains unclear whether some neurons focus on a relation itself -- independent of any entity. We hypothesize such neurons detect a relation in the input text and guide generation involving such a relation. To investigate this, we study the Llama-2 family on a chosen set of relations with a statistics-based method. Our experiments demonstrate the existence of relation-specific neurons. We measure the effect of selectively deactivating candidate neurons specific to relation $r$ on the LLM's ability to handle (1) facts whose relation is $r$ and (2) facts whose relation is a different relation $r' \neq r$. With respect to their capacity for encoding relation information, we give evidence for the following three properties of relation-specific neurons. $\textbf{(i) Neuron cumulativity.}$ The neurons for $r$ present a cumulative effect so that deactivating a larger portion of them results in the degradation of more facts in $r$. $\textbf{(ii) Neuron versatility.}$ Neurons can be shared across multiple closely related as well as less related relations. Some relation neurons transfer across languages. $\textbf{(iii) Neuron interference.}$ Deactivating neurons specific to one relation can improve LLM generation performance for facts of other relations. We will make our code publicly available at https://github.com/cisnlp/relation-specific-neurons.

Related papers

Neuron Empirical Gradient: Discovering and Quantifying Neurons Global Linear Controllability [14.693407823048478]
Our study first investigates the numerical relationship between neuron activations and model output.<n>We introduce NeurGrad, an accurate and efficient method for computing neuron empirical gradient (NEG)
arXiv Detail & Related papers (2024-12-24T00:01:24Z)
Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs [70.3132264719438]
We aim to fill the research gap by examining how neuron activation is shared across tasks and languages. We classify neurons into four distinct categories based on their responses to a specific input across different languages. Our analysis reveals the following insights: (i) the patterns of neuron sharing are significantly affected by the characteristics of tasks and examples; (ii) neuron sharing does not fully correspond with language similarity; (iii) shared neurons play a vital role in generating responses, especially those shared across all languages.
arXiv Detail & Related papers (2024-06-13T16:04:11Z)
Learning dynamic representations of the functional connectome in neurobiological networks [41.94295877935867]
We introduce an unsupervised approach to learn the dynamic affinities between neurons in live, behaving animals. We show that our method is able to robustly predict causal interactions between neurons to generate behavior.
arXiv Detail & Related papers (2024-02-21T19:54:25Z)
One-hot Generalized Linear Model for Switching Brain State Discovery [1.0132677989820746]
Inferred neural interactions from neural signals primarily reflect functional interactions. We will show that the learned prior should capture the state-constant interaction, shedding light on the underlying anatomical connectome. Our methods effectively recover true interaction structures in simulated data, achieve the highest predictive likelihood with real neural datasets, and render interaction structures and hidden states more interpretable.
arXiv Detail & Related papers (2023-10-23T18:10:22Z)
Neuron to Graph: Interpreting Language Model Neurons at Scale [8.32093320910416]
This paper introduces a novel automated approach designed to scale interpretability techniques across a vast array of neurons within Large Language Models. We propose Neuron to Graph (N2G), an innovative tool that automatically extracts a neuron's behaviour from the dataset it was trained on and translates it into an interpretable graph.
arXiv Detail & Related papers (2023-05-31T14:44:33Z)
Redundancy and Concept Analysis for Code-trained Language Models [5.726842555987591]
Code-trained language models have proven to be highly effective for various code intelligence tasks. They can be challenging to train and deploy for many software engineering applications due to computational bottlenecks and memory constraints. We perform the first neuron-level analysis for source code models to identify textitimportant neurons within latent representations.
arXiv Detail & Related papers (2023-05-01T15:22:41Z)
Coupling Artificial Neurons in BERT and Biological Neurons in the Human Brain [9.916033214833407]
This study introduces a novel, general, and effective framework to link transformer-based NLP models and neural activities in response to language. Our experimental results demonstrate 1) The activations of ANs and BNs are significantly synchronized; 2) the ANs carry meaningful linguistic/semantic information and anchor to their BN signatures; 3) the anchored BNs are interpretable in a neurolinguistic context.
arXiv Detail & Related papers (2023-03-27T01:41:48Z)
Constraints on the design of neuromorphic circuits set by the properties of neural population codes [61.15277741147157]
In the brain, information is encoded, transmitted and used to inform behaviour. Neuromorphic circuits need to encode information in a way compatible to that used by populations of neuron in the brain.
arXiv Detail & Related papers (2022-12-08T15:16:04Z)
Benchmarking Compositionality with Formal Languages [64.09083307778951]
We investigate whether large neural models in NLP can acquire the ability tocombining primitive concepts into larger novel combinations while learning from data. By randomly sampling over many transducers, we explore which of their properties contribute to learnability of a compositional relation by a neural network. We find that the models either learn the relations completely or not at all. The key is transition coverage, setting a soft learnability limit at 400 examples per transition.
arXiv Detail & Related papers (2022-08-17T10:03:18Z)
And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks. We define AND-like neurons and propose measures to increase their proportion in the network. Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy. We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.