Journey to the Center of the Knowledge Neurons: Discoveries of
Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
- URL: http://arxiv.org/abs/2308.13198v2
- Date: Wed, 20 Dec 2023 11:05:17 GMT
- Title: Journey to the Center of the Knowledge Neurons: Discoveries of
Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
- Authors: Yuheng Chen, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao
- Abstract summary: This paper delves into the complex task of understanding how factual knowledge is stored in multilingual language models.
We introduce the Architecture-adapted Multilingual Integrated Gradients method, which successfully localizes knowledge neurons more precisely.
We also conduct an in-depth exploration of knowledge neurons, leading to the following two important discoveries.
- Score: 20.56154830853632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models (PLMs) contain vast amounts of factual knowledge,
but how the knowledge is stored in the parameters remains unclear. This paper
delves into the complex task of understanding how factual knowledge is stored
in multilingual PLMs, and introduces the Architecture-adapted Multilingual
Integrated Gradients method, which successfully localizes knowledge neurons
more precisely compared to current methods, and is more universal across
various architectures and languages. Moreover, we conduct an in-depth
exploration of knowledge neurons, leading to the following two important
discoveries: (1) The discovery of Language-Independent Knowledge Neurons, which
store factual knowledge in a form that transcends language. We design
cross-lingual knowledge editing experiments, demonstrating that the PLMs can
accomplish this task based on language-independent neurons; (2) The discovery
of Degenerate Knowledge Neurons, a novel type of neuron showing that different
knowledge neurons can store the same fact. Its property of functional overlap
endows the PLMs with a robust mastery of factual knowledge. We design
fact-checking experiments, proving that the degenerate knowledge neurons can
help the PLMs to detect wrong facts. Experiments corroborate these findings,
shedding light on the mechanisms of factual knowledge storage in multilingual
PLMs, and contribute valuable insights to the field. The code is available at
https://github.com/heng840/AMIG.
Related papers
- One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models [19.58983929459173]
Large language models (LLMs) have learned vast amounts of factual knowledge through self-supervised pre-training on large-scale corpora.
LLMs have also demonstrated excellent multilingual capabilities, which can express the learned knowledge in multiple languages.
arXiv Detail & Related papers (2024-11-26T13:03:49Z) - Multilingual Knowledge Editing with Language-Agnostic Factual Neurons [98.73585104789217]
We investigate how large language models (LLMs) represent multilingual factual knowledge.
We find that the same factual knowledge in different languages generally activates a shared set of neurons, which we call language-agnostic factual neurons.
Inspired by this finding, we propose a new MKE method by locating and modifying Language-Agnostic Factual Neurons (LAFN) to simultaneously edit multilingual knowledge.
arXiv Detail & Related papers (2024-06-24T08:06:56Z) - Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts [14.69046890281591]
We introduce a novel architecture-agnostic framework capable of identifying query-relevant neurons in large language models.
We show potential applications of our detected neurons in knowledge editing and neuron-based prediction.
arXiv Detail & Related papers (2024-06-16T09:36:32Z) - Revealing the Parallel Multilingual Learning within Large Language Models [50.098518799536144]
In this study, we reveal an in-context learning capability of multilingual large language models (LLMs)
By translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities.
arXiv Detail & Related papers (2024-03-14T03:33:46Z) - Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models [23.11132761945838]
Large language models (LLMs) store extensive factual knowledge, but the underlying mechanisms remain unclear.
Previous research suggests that factual knowledge is stored within multi-layer perceptron weights.
Some storage units exhibit degeneracy, referred to as Degenerate Knowledge Neurons.
arXiv Detail & Related papers (2024-02-21T11:50:32Z) - Unveiling A Core Linguistic Region in Large Language Models [49.860260050718516]
This paper conducts an analogical research using brain localization as a prototype.
We have discovered a core region in large language models that corresponds to linguistic competence.
We observe that an improvement in linguistic competence does not necessarily accompany an elevation in the model's knowledge level.
arXiv Detail & Related papers (2023-10-23T13:31:32Z) - Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus.
We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z) - A Survey of Knowledge Enhanced Pre-trained Language Models [78.56931125512295]
We present a comprehensive review of Knowledge Enhanced Pre-trained Language Models (KE-PLMs)
For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG) and rule knowledge.
The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods.
arXiv Detail & Related papers (2022-11-11T04:29:02Z) - Discovering Salient Neurons in Deep NLP Models [31.18937787704794]
We present a technique called as Linguistic Correlation Analysis to extract salient neurons in the model.
Our data-driven, quantitative analysis illuminates interesting findings.
Our code is publicly available as part of the NeuroX toolkit.
arXiv Detail & Related papers (2022-06-27T13:31:49Z) - Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks.
Experiments on text classification show promising results.
We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.