Capability Localization: Capabilities Can be Localized rather than Individual Knowledge
- URL: http://arxiv.org/abs/2502.20992v1
- Date: Fri, 28 Feb 2025 12:22:13 GMT
- Title: Capability Localization: Capabilities Can be Localized rather than Individual Knowledge
- Authors: Xiusheng Huang, Jiaxiang Liu, Yequan Wang, Jun Zhao, Kang Liu,
- Abstract summary: Large scale language models have achieved superior performance in tasks related to natural language processing.<n>Previous studies assumed that individual knowledge is stored in local parameters, and the storage form of individual knowledge is dispersed parameters, parameter layers, or parameter chains.<n>This paper proposes a Commonality Neuron localization (CNL) method, which successfully locates commonality neurons and achieves a neuron overlap rate of 96.42% on the GSM8K dataset.
- Score: 22.63726568778859
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large scale language models have achieved superior performance in tasks related to natural language processing, however, it is still unclear how model parameters affect performance improvement. Previous studies assumed that individual knowledge is stored in local parameters, and the storage form of individual knowledge is dispersed parameters, parameter layers, or parameter chains, which are not unified. We found through fidelity and reliability evaluation experiments that individual knowledge cannot be localized. Afterwards, we constructed a dataset for decoupling experiments and discovered the potential for localizing data commonalities. To further reveal this phenomenon, this paper proposes a Commonality Neuron Localization (CNL) method, which successfully locates commonality neurons and achieves a neuron overlap rate of 96.42% on the GSM8K dataset. Finally, we have demonstrated through cross data experiments that commonality neurons are a collection of capability neurons that possess the capability to enhance performance. Our code is available at https://github.com/nlpkeg/Capability-Neuron-Localization.
Related papers
- Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.
We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.
This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z) - Knowledge Editing for Large Language Model with Knowledge Neuronal Ensemble [13.608354678065222]
We propose a novel knowledge editing method called Knowledge Neuronal Ensemble (KNE)<n>A knowledge neuronal ensemble represents a group of neurons encoding specific knowledge, thus mitigating the issue of frequent parameter modification.<n> Experimental results on three widely used knowledge editing datasets show that the KNE method significantly improves the accuracy of knowledge editing.
arXiv Detail & Related papers (2024-12-30T00:58:00Z) - Neuron Empirical Gradient: Discovering and Quantifying Neurons Global Linear Controllability [14.693407823048478]
Our study first investigates the numerical relationship between neuron activations and model output.<n>We introduce NeurGrad, an accurate and efficient method for computing neuron empirical gradient (NEG)
arXiv Detail & Related papers (2024-12-24T00:01:24Z) - What should a neuron aim for? Designing local objective functions based on information theory [41.39714023784306]
We show how self-organized artificial neurons can be achieved by designing bio-inspired local learning goals.
These goals are parameterized using a recent extension of information theory, Partial Information Decomposition.
Our work advances a principled information-theoretic foundation for local learning strategies.
arXiv Detail & Related papers (2024-12-03T14:45:46Z) - Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts [14.69046890281591]
We introduce a novel architecture-agnostic framework capable of identifying query-relevant neurons in large language models.<n>We show potential applications of our detected neurons in knowledge editing and neuron-based prediction.
arXiv Detail & Related papers (2024-06-16T09:36:32Z) - Benchmarking Compositionality with Formal Languages [64.09083307778951]
We investigate whether large neural models in NLP can acquire the ability tocombining primitive concepts into larger novel combinations while learning from data.
By randomly sampling over many transducers, we explore which of their properties contribute to learnability of a compositional relation by a neural network.
We find that the models either learn the relations completely or not at all. The key is transition coverage, setting a soft learnability limit at 400 examples per transition.
arXiv Detail & Related papers (2022-08-17T10:03:18Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z) - The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain.
In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z) - Learning Realistic Patterns from Unrealistic Stimuli: Generalization and
Data Anonymization [0.5091527753265949]
This work investigates a simple yet unconventional approach for anonymized data synthesis to enable third parties to benefit from such private data.
We use sleep monitoring data from both an open and a large closed clinical study and evaluate whether (1) end-users can create and successfully use customized classification models for sleep apnea detection, and (2) the identity of participants in the study is protected.
arXiv Detail & Related papers (2020-09-21T16:31:21Z) - Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy.
We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.