What Do the Circuits Mean? A Knowledge Edit View
- URL: http://arxiv.org/abs/2406.17241v1
- Date: Tue, 25 Jun 2024 03:09:53 GMT
- Title: What Do the Circuits Mean? A Knowledge Edit View
- Authors: Huaizhi Ge, Frank Rudzicz, Zining Zhu,
- Abstract summary: We use diverse text classification datasets to extract circuits in the GPT2-XL model.
Our findings indicate that these circuits contain entity knowledge but resist new knowledge more than complementary circuits during knowledge editing.
We find that up to 60% of the circuits consist of layer modules rather than attention or normalization modules.
- Score: 18.022428746019582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of language model interpretability, circuit discovery is gaining popularity. Despite this, the true meaning of these circuits remain largely unanswered. We introduce a novel method to learn their meanings as a holistic object through the lens of knowledge editing. We extract circuits in the GPT2-XL model using diverse text classification datasets, and use hierarchical relations datasets to explore knowledge editing in the circuits. Our findings indicate that these circuits contain entity knowledge but resist new knowledge more than complementary circuits during knowledge editing. Additionally, we examine the impact of circuit size, discovering that an ideal "theoretical circuit" where essential knowledge is concentrated likely incorporates more than 5% but less than 50% of the model's parameters. We also assess the overlap between circuits from different datasets, finding moderate similarities. What constitutes these circuits, then? We find that up to 60% of the circuits consist of layer normalization modules rather than attention or MLP modules, adding evidence to the ongoing debates regarding knowledge localization. In summary, our findings offer new insights into the functions of the circuits, and introduce research directions for further interpretability and safety research of language models.
Related papers
- How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training [92.88889953768455]
Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge.
We identify computational subgraphs that facilitate knowledge storage and processing.
arXiv Detail & Related papers (2025-02-16T16:55:43Z) - Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models [22.89563355840371]
We study the modularity of neural networks by analyzing circuits for highly compositional subtasks within a language model.
Our results indicate that functionally similar circuits exhibit both notable node overlap and cross-task faithfulness.
arXiv Detail & Related papers (2024-10-02T11:36:45Z) - Knowledge Circuits in Pretrained Transformers [47.342682123081204]
The inner workings of how modern large language models store knowledge have long been a subject of intense interest and investigation among researchers.
In this paper, we delve into the graph of the language model to uncover the knowledge circuits that are instrumental in articulating specific knowledge.
We evaluate the impact of current knowledge editing techniques on these knowledge circuits, providing deeper insights into the functioning and constraints of these editing methodologies.
arXiv Detail & Related papers (2024-05-28T08:56:33Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Knowledge-Infused Self Attention Transformers [11.008412414253662]
Transformer-based language models have achieved impressive success in various natural language processing tasks.
This paper introduces a systematic method for infusing knowledge into different components of a transformer-based model.
arXiv Detail & Related papers (2023-06-23T13:55:01Z) - UNTER: A Unified Knowledge Interface for Enhancing Pre-trained Language
Models [100.4659557650775]
We propose a UNified knowledge inTERface, UNTER, to provide a unified perspective to exploit both structured knowledge and unstructured knowledge.
With both forms of knowledge injected, UNTER gains continuous improvements on a series of knowledge-driven NLP tasks.
arXiv Detail & Related papers (2023-05-02T17:33:28Z) - LM-CORE: Language Models with Contextually Relevant External Knowledge [13.451001884972033]
We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements.
We present LM-CORE -- a general framework to achieve this -- that allows textitdecoupling of the language model training from the external knowledge source.
Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks.
arXiv Detail & Related papers (2022-08-12T18:59:37Z) - Learning to Express in Knowledge-Grounded Conversation [62.338124154016825]
We consider two aspects of knowledge expression, namely the structure of the response and style of the content in each part.
We propose a segmentation-based generation model and optimize the model by a variational approach to discover the underlying pattern of knowledge expression in a response.
arXiv Detail & Related papers (2022-04-12T13:43:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.