Related papers: Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

URL: http://arxiv.org/abs/2308.13198v2
Date: Wed, 20 Dec 2023 11:05:17 GMT
Title: Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
Authors: Yuheng Chen, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao
Abstract summary: This paper delves into the complex task of understanding how factual knowledge is stored in multilingual language models. We introduce the Architecture-adapted Multilingual Integrated Gradients method, which successfully localizes knowledge neurons more precisely. We also conduct an in-depth exploration of knowledge neurons, leading to the following two important discoveries.
Score: 20.56154830853632
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-trained language models (PLMs) contain vast amounts of factual knowledge, but how the knowledge is stored in the parameters remains unclear. This paper delves into the complex task of understanding how factual knowledge is stored in multilingual PLMs, and introduces the Architecture-adapted Multilingual Integrated Gradients method, which successfully localizes knowledge neurons more precisely compared to current methods, and is more universal across various architectures and languages. Moreover, we conduct an in-depth exploration of knowledge neurons, leading to the following two important discoveries: (1) The discovery of Language-Independent Knowledge Neurons, which store factual knowledge in a form that transcends language. We design cross-lingual knowledge editing experiments, demonstrating that the PLMs can accomplish this task based on language-independent neurons; (2) The discovery of Degenerate Knowledge Neurons, a novel type of neuron showing that different knowledge neurons can store the same fact. Its property of functional overlap endows the PLMs with a robust mastery of factual knowledge. We design fact-checking experiments, proving that the degenerate knowledge neurons can help the PLMs to detect wrong facts. Experiments corroborate these findings, shedding light on the mechanisms of factual knowledge storage in multilingual PLMs, and contribute valuable insights to the field. The code is available at https://github.com/heng840/AMIG.

Related papers

Towards Understanding How Knowledge Evolves in Large Vision-Language Models [55.82918299608732]
We investigate how multimodal knowledge evolves and eventually induces natural languages in Large Vision-Language Models (LVLMs) We identify two key nodes in knowledge evolution: the critical layers and the mutation layers, dividing the evolution process into three stages: rapid evolution, stabilization, and mutation. Our research is the first to reveal the trajectory of knowledge evolution in LVLMs, providing a fresh perspective for understanding their underlying mechanisms.
arXiv Detail & Related papers (2025-03-31T17:35:37Z)
One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models [19.58983929459173]
Large language models (LLMs) have learned vast amounts of factual knowledge through self-supervised pre-training on large-scale corpora. LLMs have also demonstrated excellent multilingual capabilities, which can express the learned knowledge in multiple languages.
arXiv Detail & Related papers (2024-11-26T13:03:49Z)
Multilingual Knowledge Editing with Language-Agnostic Factual Neurons [98.73585104789217]
We investigate how large language models (LLMs) represent multilingual factual knowledge. We find that the same factual knowledge in different languages generally activates a shared set of neurons, which we call language-agnostic factual neurons. Inspired by this finding, we propose a new MKE method by locating and modifying Language-Agnostic Factual Neurons (LAFN) to simultaneously edit multilingual knowledge.
arXiv Detail & Related papers (2024-06-24T08:06:56Z)
Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts [14.69046890281591]
We introduce a novel architecture-agnostic framework capable of identifying query-relevant neurons in large language models. We show potential applications of our detected neurons in knowledge editing and neuron-based prediction.
arXiv Detail & Related papers (2024-06-16T09:36:32Z)
Revealing the Parallel Multilingual Learning within Large Language Models [50.098518799536144]
In this study, we reveal an in-context learning capability of multilingual large language models (LLMs) By translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities.
arXiv Detail & Related papers (2024-03-14T03:33:46Z)
Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models [23.11132761945838]
Large language models (LLMs) store extensive factual knowledge, but the underlying mechanisms remain unclear. Previous research suggests that factual knowledge is stored within multi-layer perceptron weights. Some storage units exhibit degeneracy, referred to as Degenerate Knowledge Neurons.
arXiv Detail & Related papers (2024-02-21T11:50:32Z)
Unveiling A Core Linguistic Region in Large Language Models [49.860260050718516]
This paper conducts an analogical research using brain localization as a prototype. We have discovered a core region in large language models that corresponds to linguistic competence. We observe that an improvement in linguistic competence does not necessarily accompany an elevation in the model's knowledge level.
arXiv Detail & Related papers (2023-10-23T13:31:32Z)
Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus. We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z)
A Survey of Knowledge Enhanced Pre-trained Language Models [78.56931125512295]
We present a comprehensive review of Knowledge Enhanced Pre-trained Language Models (KE-PLMs) For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG) and rule knowledge. The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods.
arXiv Detail & Related papers (2022-11-11T04:29:02Z)
Discovering Salient Neurons in Deep NLP Models [31.18937787704794]
We present a technique called as Linguistic Correlation Analysis to extract salient neurons in the model. Our data-driven, quantitative analysis illuminates interesting findings. Our code is publicly available as part of the NeuroX toolkit.
arXiv Detail & Related papers (2022-06-27T13:31:49Z)
Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks. Experiments on text classification show promising results. We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.