Related papers: What Do the Circuits Mean? A Knowledge Edit View

What Do the Circuits Mean? A Knowledge Edit View

URL: http://arxiv.org/abs/2406.17241v1
Date: Tue, 25 Jun 2024 03:09:53 GMT
Title: What Do the Circuits Mean? A Knowledge Edit View
Authors: Huaizhi Ge, Frank Rudzicz, Zining Zhu,
Abstract summary: We use diverse text classification datasets to extract circuits in the GPT2-XL model. Our findings indicate that these circuits contain entity knowledge but resist new knowledge more than complementary circuits during knowledge editing. We find that up to 60% of the circuits consist of layer modules rather than attention or normalization modules.
Score: 18.022428746019582
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the field of language model interpretability, circuit discovery is gaining popularity. Despite this, the true meaning of these circuits remain largely unanswered. We introduce a novel method to learn their meanings as a holistic object through the lens of knowledge editing. We extract circuits in the GPT2-XL model using diverse text classification datasets, and use hierarchical relations datasets to explore knowledge editing in the circuits. Our findings indicate that these circuits contain entity knowledge but resist new knowledge more than complementary circuits during knowledge editing. Additionally, we examine the impact of circuit size, discovering that an ideal "theoretical circuit" where essential knowledge is concentrated likely incorporates more than 5% but less than 50% of the model's parameters. We also assess the overlap between circuits from different datasets, finding moderate similarities. What constitutes these circuits, then? We find that up to 60% of the circuits consist of layer normalization modules rather than attention or MLP modules, adding evidence to the ongoing debates regarding knowledge localization. In summary, our findings offer new insights into the functions of the circuits, and introduce research directions for further interpretability and safety research of language models.

Related papers

Localizing Knowledge in Diffusion Transformers [44.27817967554535]
We propose a model- and knowledge-agnostic method to localize where specific types of knowledge are encoded within the Diffusion Transformer blocks.<n>We show that the identified blocks are both interpretable and causally linked to the expression of knowledge in generated outputs.<n>Our findings offer new insights into the internal structure of DiTs and introduce a practical pathway for more interpretable, efficient, and controllable model editing.
arXiv Detail & Related papers (2025-05-24T19:02:20Z)
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training [92.88889953768455]
Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge. We identify computational subgraphs that facilitate knowledge storage and processing.
arXiv Detail & Related papers (2025-02-16T16:55:43Z)
Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability [3.138731415322007]
We investigate the generality of the indirect object identification (IOI) circuit in GPT-2 small. Our findings reveal that the circuit generalizes surprisingly well, reusing all of its components and mechanisms while only adding additional input edges.
arXiv Detail & Related papers (2024-11-25T05:32:34Z)
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models [22.89563355840371]
We study the modularity of neural networks by analyzing circuits for highly compositional subtasks within a language model. Our results indicate that functionally similar circuits exhibit both notable node overlap and cross-task faithfulness.
arXiv Detail & Related papers (2024-10-02T11:36:45Z)
Transformer Circuit Faithfulness Metrics are not Robust [0.04260910081285213]
We measure circuit 'faithfulness' by ablating portions of the model's computation. We conclude that existing circuit faithfulness scores reflect both the methodological choices of researchers as well as the actual components of the circuit. The ultimate goal of mechanistic interpretability work is to understand neural networks, so we emphasize the need for more clarity in the precise claims being made about circuits.
arXiv Detail & Related papers (2024-07-11T17:59:00Z)
Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning [14.639036250438517]
We introduce a comprehensive reformulation of the task known as Circuit Discovery, along with DiscoGP. DiscoGP is a novel and effective algorithm based on differentiable masking for discovering circuits.
arXiv Detail & Related papers (2024-07-04T09:42:25Z)
Knowledge Circuits in Pretrained Transformers [47.342682123081204]
The inner workings of how modern large language models store knowledge have long been a subject of intense interest and investigation among researchers. In this paper, we delve into the graph of the language model to uncover the knowledge circuits that are instrumental in articulating specific knowledge. We evaluate the impact of current knowledge editing techniques on these knowledge circuits, providing deeper insights into the functioning and constraints of these editing methodologies.
arXiv Detail & Related papers (2024-05-28T08:56:33Z)
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models [55.19497659895122]
We introduce methods for discovering and applying sparse feature circuits. These are causally implicatedworks of human-interpretable features for explaining language model behaviors.
arXiv Detail & Related papers (2024-03-28T17:56:07Z)
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms [35.514624827207136]
Edge attribution patching (EAP), gradient-based approximation to interventions, has emerged as a scalable but imperfect solution to this problem. We introduce a new method - EAP with integrated gradients (EAP-IG) - that aims to better maintain a core property of circuits: faithfulness. Our experiments demonstrate that circuits found using EAP are less faithful than those found using EAP-IG, even though both have high node overlap with circuits found previously using causal interventions.
arXiv Detail & Related papers (2024-03-26T15:44:58Z)
CktGNN: Circuit Graph Neural Network for Electronic Design Automation [67.29634073660239]
This paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing. We introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers. Our work paves the way toward a learning-based open-sourced design automation for analog circuits.
arXiv Detail & Related papers (2023-08-31T02:20:25Z)
Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models. Recent research has sought to leverage large language models (LLMs) to improve IR systems. We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z)
Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world. The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time. The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z)
Knowledge-Infused Self Attention Transformers [11.008412414253662]
Transformer-based language models have achieved impressive success in various natural language processing tasks. This paper introduces a systematic method for infusing knowledge into different components of a transformer-based model.
arXiv Detail & Related papers (2023-06-23T13:55:01Z)
Adaptive Planning Search Algorithm for Analog Circuit Verification [53.97809573610992]
We propose a machine learning (ML) approach, which uses less simulations. We show that the proposed approach is able to provide OCCs closer to the specifications for all circuits.
arXiv Detail & Related papers (2023-06-23T12:57:46Z)
UNTER: A Unified Knowledge Interface for Enhancing Pre-trained Language Models [100.4659557650775]
We propose a UNified knowledge inTERface, UNTER, to provide a unified perspective to exploit both structured knowledge and unstructured knowledge. With both forms of knowledge injected, UNTER gains continuous improvements on a series of knowledge-driven NLP tasks.
arXiv Detail & Related papers (2023-05-02T17:33:28Z)
LM-CORE: Language Models with Contextually Relevant External Knowledge [13.451001884972033]
We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements. We present LM-CORE -- a general framework to achieve this -- that allows textitdecoupling of the language model training from the external knowledge source. Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks.
arXiv Detail & Related papers (2022-08-12T18:59:37Z)
Learning to Express in Knowledge-Grounded Conversation [62.338124154016825]
We consider two aspects of knowledge expression, namely the structure of the response and style of the content in each part. We propose a segmentation-based generation model and optimize the model by a variational approach to discover the underlying pattern of knowledge expression in a response.
arXiv Detail & Related papers (2022-04-12T13:43:47Z)
Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling and Design [68.1682448368636]
We present a supervised pretraining approach to learn circuit representations that can be adapted to new unseen topologies or unseen prediction tasks. To cope with the variable topological structure of different circuits we describe each circuit as a graph and use graph neural networks (GNNs) to learn node embeddings. We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties.
arXiv Detail & Related papers (2022-03-29T21:18:47Z)
On the realistic worst case analysis of quantum arithmetic circuits [69.43216268165402]
We show that commonly held intuitions when designing quantum circuits can be misleading. We show that reducing the T-count can increase the total depth. We illustrate our method on addition and multiplication circuits using ripple-carry.
arXiv Detail & Related papers (2021-01-12T21:36:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.