Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models
- URL: http://arxiv.org/abs/2402.13731v2
- Date: Mon, 17 Jun 2024 03:44:10 GMT
- Title: Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models
- Authors: Yuheng Chen, Pengfei Cao, Yubo Chen, Yining Wang, Shengping Liu, Kang Liu, Jun Zhao,
- Abstract summary: Large language models (LLMs) store extensive factual knowledge, but the underlying mechanisms remain unclear.
Previous research suggests that factual knowledge is stored within multi-layer perceptron weights.
Some storage units exhibit degeneracy, referred to as Degenerate Knowledge Neurons.
- Score: 23.11132761945838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) store extensive factual knowledge, but the underlying mechanisms remain unclear. Previous research suggests that factual knowledge is stored within multi-layer perceptron weights, and some storage units exhibit degeneracy, referred to as Degenerate Knowledge Neurons (DKNs). Despite the novelty and unique properties of this concept, it has not been rigorously defined or systematically studied. We first consider the connection weight patterns of MLP neurons and define DKNs from both structural and functional aspects. Based on this, we introduce the Neurological Topology Clustering method, which allows the formation of DKNs in any numbers and structures, leading to a more accurate DKN acquisition. Furthermore, inspired by cognitive science, we explore the relationship between DKNs and the robustness, evolvability, and complexity of LLMs. Our execution of 34 experiments under 6 settings demonstrates the connection between DKNs and these three properties. The code will be available soon.
Related papers
- Knowledge Mechanisms in Large Language Models: A Survey and Perspective [88.51320482620679]
This paper reviews knowledge mechanism analysis from a novel taxonomy including knowledge utilization and evolution.
We discuss what knowledge LLMs have learned, the reasons for the fragility of parametric knowledge, and the potential dark knowledge (hypothesis) that will be challenging to address.
arXiv Detail & Related papers (2024-07-22T06:15:59Z) - Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs [55.317267269115845]
Chain-of-Knowledge (CoK) is a comprehensive framework for knowledge reasoning.
CoK includes methodologies for both dataset construction and model learning.
We conduct extensive experiments with KnowReason.
arXiv Detail & Related papers (2024-06-30T10:49:32Z) - Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts [14.69046890281591]
We introduce a novel architecture-agnostic framework capable of identifying query-relevant neurons in large language models.
We show potential applications of our detected neurons in knowledge editing and neuron-based prediction.
arXiv Detail & Related papers (2024-06-16T09:36:32Z) - Knowledge Localization: Mission Not Accomplished? Enter Query Localization! [19.16542466297147]
The Knowledge Neuron (KN) thesis is a prominent theory for explaining these mechanisms.
We re-examine the knowledge localization (KL) assumption and confirm the existence of facts that do not adhere to it from both statistical and knowledge modification perspectives.
We propose the Consistency-Aware KN modification method, which improves the performance of knowledge modification.
arXiv Detail & Related papers (2024-05-23T02:44:12Z) - What does the Knowledge Neuron Thesis Have to do with Knowledge? [13.651280182588666]
We reassess the Knowledge Neuron (KN): an interpretation of the mechanism underlying the ability of large language models to recall facts from a training corpus.
We find that this thesis is, at best, an oversimplification.
arXiv Detail & Related papers (2024-05-03T18:34:37Z) - Journey to the Center of the Knowledge Neurons: Discoveries of
Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons [20.56154830853632]
This paper delves into the complex task of understanding how factual knowledge is stored in multilingual language models.
We introduce the Architecture-adapted Multilingual Integrated Gradients method, which successfully localizes knowledge neurons more precisely.
We also conduct an in-depth exploration of knowledge neurons, leading to the following two important discoveries.
arXiv Detail & Related papers (2023-08-25T06:26:05Z) - Language Knowledge-Assisted Representation Learning for Skeleton-Based
Action Recognition [71.35205097460124]
How humans understand and recognize the actions of others is a complex neuroscientific problem.
LA-GCN proposes a graph convolution network using large-scale language models (LLM) knowledge assistance.
arXiv Detail & Related papers (2023-05-21T08:29:16Z) - Benchmarking Compositionality with Formal Languages [64.09083307778951]
We investigate whether large neural models in NLP can acquire the ability tocombining primitive concepts into larger novel combinations while learning from data.
By randomly sampling over many transducers, we explore which of their properties contribute to learnability of a compositional relation by a neural network.
We find that the models either learn the relations completely or not at all. The key is transition coverage, setting a soft learnability limit at 400 examples per transition.
arXiv Detail & Related papers (2022-08-17T10:03:18Z) - Discovering Salient Neurons in Deep NLP Models [31.18937787704794]
We present a technique called as Linguistic Correlation Analysis to extract salient neurons in the model.
Our data-driven, quantitative analysis illuminates interesting findings.
Our code is publicly available as part of the NeuroX toolkit.
arXiv Detail & Related papers (2022-06-27T13:31:49Z) - CogNGen: Constructing the Kernel of a Hyperdimensional Predictive
Processing Cognitive Architecture [79.07468367923619]
We present a new cognitive architecture that combines two neurobiologically plausible, computational models.
We aim to develop a cognitive architecture that has the power of modern machine learning techniques.
arXiv Detail & Related papers (2022-03-31T04:44:28Z) - Neural Networks Enhancement with Logical Knowledge [83.9217787335878]
We propose an extension of KENN for relational data.
The results show that KENN is capable of increasing the performances of the underlying neural network even in the presence relational data.
arXiv Detail & Related papers (2020-09-13T21:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.