What Do the Circuits Mean? A Knowledge Edit View
- URL: http://arxiv.org/abs/2406.17241v1
- Date: Tue, 25 Jun 2024 03:09:53 GMT
- Title: What Do the Circuits Mean? A Knowledge Edit View
- Authors: Huaizhi Ge, Frank Rudzicz, Zining Zhu,
- Abstract summary: We use diverse text classification datasets to extract circuits in the GPT2-XL model.
Our findings indicate that these circuits contain entity knowledge but resist new knowledge more than complementary circuits during knowledge editing.
We find that up to 60% of the circuits consist of layer modules rather than attention or normalization modules.
- Score: 18.022428746019582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of language model interpretability, circuit discovery is gaining popularity. Despite this, the true meaning of these circuits remain largely unanswered. We introduce a novel method to learn their meanings as a holistic object through the lens of knowledge editing. We extract circuits in the GPT2-XL model using diverse text classification datasets, and use hierarchical relations datasets to explore knowledge editing in the circuits. Our findings indicate that these circuits contain entity knowledge but resist new knowledge more than complementary circuits during knowledge editing. Additionally, we examine the impact of circuit size, discovering that an ideal "theoretical circuit" where essential knowledge is concentrated likely incorporates more than 5% but less than 50% of the model's parameters. We also assess the overlap between circuits from different datasets, finding moderate similarities. What constitutes these circuits, then? We find that up to 60% of the circuits consist of layer normalization modules rather than attention or MLP modules, adding evidence to the ongoing debates regarding knowledge localization. In summary, our findings offer new insights into the functions of the circuits, and introduce research directions for further interpretability and safety research of language models.
Related papers
- Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability [3.138731415322007]
We investigate the generality of the indirect object identification (IOI) circuit in GPT-2 small.
Our findings reveal that the circuit generalizes surprisingly well, reusing all of its components and mechanisms while only adding additional input edges.
arXiv Detail & Related papers (2024-11-25T05:32:34Z) - Transformer Circuit Faithfulness Metrics are not Robust [0.04260910081285213]
We measure circuit 'faithfulness' by ablating portions of the model's computation.
We conclude that existing circuit faithfulness scores reflect both the methodological choices of researchers as well as the actual components of the circuit.
The ultimate goal of mechanistic interpretability work is to understand neural networks, so we emphasize the need for more clarity in the precise claims being made about circuits.
arXiv Detail & Related papers (2024-07-11T17:59:00Z) - Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning [14.639036250438517]
We introduce a comprehensive reformulation of the task known as Circuit Discovery, along with DiscoGP.
DiscoGP is a novel and effective algorithm based on differentiable masking for discovering circuits.
arXiv Detail & Related papers (2024-07-04T09:42:25Z) - Knowledge Circuits in Pretrained Transformers [47.342682123081204]
The inner workings of how modern large language models store knowledge have long been a subject of intense interest and investigation among researchers.
In this paper, we delve into the graph of the language model to uncover the knowledge circuits that are instrumental in articulating specific knowledge.
We evaluate the impact of current knowledge editing techniques on these knowledge circuits, providing deeper insights into the functioning and constraints of these editing methodologies.
arXiv Detail & Related papers (2024-05-28T08:56:33Z) - Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models [55.19497659895122]
We introduce methods for discovering and applying sparse feature circuits.
These are causally implicatedworks of human-interpretable features for explaining language model behaviors.
arXiv Detail & Related papers (2024-03-28T17:56:07Z) - Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms [35.514624827207136]
Edge attribution patching (EAP), gradient-based approximation to interventions, has emerged as a scalable but imperfect solution to this problem.
We introduce a new method - EAP with integrated gradients (EAP-IG) - that aims to better maintain a core property of circuits: faithfulness.
Our experiments demonstrate that circuits found using EAP are less faithful than those found using EAP-IG, even though both have high node overlap with circuits found previously using causal interventions.
arXiv Detail & Related papers (2024-03-26T15:44:58Z) - CktGNN: Circuit Graph Neural Network for Electronic Design Automation [67.29634073660239]
This paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing.
We introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers.
Our work paves the way toward a learning-based open-sourced design automation for analog circuits.
arXiv Detail & Related papers (2023-08-31T02:20:25Z) - Adaptive Planning Search Algorithm for Analog Circuit Verification [53.97809573610992]
We propose a machine learning (ML) approach, which uses less simulations.
We show that the proposed approach is able to provide OCCs closer to the specifications for all circuits.
arXiv Detail & Related papers (2023-06-23T12:57:46Z) - Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling
and Design [68.1682448368636]
We present a supervised pretraining approach to learn circuit representations that can be adapted to new unseen topologies or unseen prediction tasks.
To cope with the variable topological structure of different circuits we describe each circuit as a graph and use graph neural networks (GNNs) to learn node embeddings.
We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties.
arXiv Detail & Related papers (2022-03-29T21:18:47Z) - On the realistic worst case analysis of quantum arithmetic circuits [69.43216268165402]
We show that commonly held intuitions when designing quantum circuits can be misleading.
We show that reducing the T-count can increase the total depth.
We illustrate our method on addition and multiplication circuits using ripple-carry.
arXiv Detail & Related papers (2021-01-12T21:36:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.