Finding patterns in Knowledge Attribution for Transformers
- URL: http://arxiv.org/abs/2205.01366v2
- Date: Wed, 4 May 2022 16:45:59 GMT
- Title: Finding patterns in Knowledge Attribution for Transformers
- Authors: Jeevesh Juneja and Ritu Agarwal
- Abstract summary: We use a 12-layer multi-lingual BERT model for our experiments.
We observe that mostly factual knowledge can be attributed to middle and higher layers of the network.
Applying the attribution scheme for grammatical knowledge, we find that grammatical knowledge is far more dispersed among the neurons than factual knowledge.
- Score: 1.52292571922932
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We analyze the Knowledge Neurons framework for the attribution of factual and
relational knowledge to particular neurons in the transformer network. We use a
12-layer multi-lingual BERT model for our experiments. Our study reveals
various interesting phenomena. We observe that mostly factual knowledge can be
attributed to middle and higher layers of the network($\ge 6$). Further
analysis reveals that the middle layers($6-9$) are mostly responsible for
relational information, which is further refined into actual factual knowledge
or the "correct answer" in the last few layers($10-12$). Our experiments also
show that the model handles prompts in different languages, but representing
the same fact, similarly, providing further evidence for effectiveness of
multi-lingual pre-training. Applying the attribution scheme for grammatical
knowledge, we find that grammatical knowledge is far more dispersed among the
neurons than factual knowledge.
Related papers
- Does Knowledge Localization Hold True? Surprising Differences Between Entity and Relation Perspectives in Language Models [20.157061521694096]
This study investigates the differences between entity and relational knowledge through knowledge editing.
To further elucidate the differences between entity and relational knowledge, we employ causal analysis to investigate how relational knowledge is stored in pre-trained models.
This insight highlights the multifaceted nature of knowledge storage in language models, underscoring the complexity of manipulating specific types of knowledge within these models.
arXiv Detail & Related papers (2024-09-01T05:09:11Z) - Large Language Models Are Cross-Lingual Knowledge-Free Reasoners [43.99097308487008]
We decompose the process of reasoning tasks into two separated components: knowledge retrieval and knowledge-free reasoning.
We show that the knowledge-free reasoning capability can be nearly perfectly transferred across various source-target language directions.
We hypothesize that knowledge-free reasoning shares similar neurons in different languages for reasoning, while knowledge is stored separately in different languages.
arXiv Detail & Related papers (2024-06-24T14:03:04Z) - Multilingual Knowledge Editing with Language-Agnostic Factual Neurons [98.73585104789217]
We investigate how large language models (LLMs) represent multilingual factual knowledge.
We find that the same factual knowledge in different languages generally activates a shared set of neurons, which we call language-agnostic factual neurons.
Inspired by this finding, we propose a new MKE method by locating and modifying Language-Agnostic Factual Neurons (LAFN) to simultaneously edit multilingual knowledge.
arXiv Detail & Related papers (2024-06-24T08:06:56Z) - Journey to the Center of the Knowledge Neurons: Discoveries of
Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons [20.56154830853632]
This paper delves into the complex task of understanding how factual knowledge is stored in multilingual language models.
We introduce the Architecture-adapted Multilingual Integrated Gradients method, which successfully localizes knowledge neurons more precisely.
We also conduct an in-depth exploration of knowledge neurons, leading to the following two important discoveries.
arXiv Detail & Related papers (2023-08-25T06:26:05Z) - The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources
in Natural Language Understanding Systems [87.3207729953778]
We evaluate state-of-the-art coreference resolution models on our dataset.
Several models struggle to reason on-the-fly over knowledge observed both at pretrain time and at inference time.
Still, even the best performing models seem to have difficulties with reliably integrating knowledge presented only at inference time.
arXiv Detail & Related papers (2022-12-15T23:26:54Z) - BERTnesia: Investigating the capture and forgetting of knowledge in BERT [7.304523502384361]
We probe BERT specifically to understand and measure the relational knowledge it captures in its parametric memory.
Our findings show that knowledge is not just contained in BERT's final layers.
When BERT is fine-tuned, relational knowledge is forgotten.
arXiv Detail & Related papers (2021-06-05T14:23:49Z) - Probing Across Time: What Does RoBERTa Know and When? [70.20775905353794]
We show that linguistic knowledge is acquired fast, stably, and robustly across domains. Facts and commonsense are slower and more domain-sensitive.
We believe that probing-across-time analyses can help researchers understand the complex, intermingled learning that these models undergo and guide us toward more efficient approaches that accomplish necessary learning faster.
arXiv Detail & Related papers (2021-04-16T04:26:39Z) - Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks.
Experiments on text classification show promising results.
We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z) - Probing Pretrained Language Models for Lexical Semantics [76.73599166020307]
We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks.
Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
arXiv Detail & Related papers (2020-10-12T14:24:01Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.