Automatically Identifying Local and Global Circuits with Linear Computation Graphs
- URL: http://arxiv.org/abs/2405.13868v2
- Date: Sun, 21 Jul 2024 11:42:32 GMT
- Title: Automatically Identifying Local and Global Circuits with Linear Computation Graphs
- Authors: Xuyang Ge, Fukang Zhu, Wentao Shu, Junxuan Wang, Zhengfu He, Xipeng Qiu,
- Abstract summary: We introduce our circuit discovery pipeline with Sparse Autoencoders (SAEs) and a variant called Transcoders.
Our methods do not require linear approximation to compute the causal effect of each node.
We analyze three kinds of circuits in GPT-2 Small: bracket, induction, and Indirect Object Identification circuits.
- Score: 45.760716193942685
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Circuit analysis of any certain model behavior is a central task in mechanistic interpretability. We introduce our circuit discovery pipeline with Sparse Autoencoders (SAEs) and a variant called Transcoders. With these two modules inserted into the model, the model's computation graph with respect to OV and MLP circuits becomes strictly linear. Our methods do not require linear approximation to compute the causal effect of each node. This fine-grained graph identifies both end-to-end and local circuits accounting for either logits or intermediate features. We can scalably apply this pipeline with a technique called Hierarchical Attribution. We analyze three kinds of circuits in GPT-2 Small: bracket, induction, and Indirect Object Identification circuits. Our results reveal new findings underlying existing discoveries.
Related papers
- Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning [14.639036250438517]
We introduce a comprehensive reformulation of the task known as Circuit Discovery, along with DiscoGP.
DiscoGP is a novel and effective algorithm based on differentiable masking for discovering circuits.
arXiv Detail & Related papers (2024-07-04T09:42:25Z) - Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition [10.13822875330178]
We introduce contextual decomposition for transformers (CD-T) to build interpretable circuits in large language models.
CD-T can produce circuits of arbitrary level of abstraction, and is the first able to produce circuits as fine-grained as attention heads.
We show CD-T circuits are able to perfectly replicate original models' behavior using fewer nodes than the baselines for all tasks.
arXiv Detail & Related papers (2024-07-01T01:12:20Z) - Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery.
Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods.
Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z) - Transcoders Find Interpretable LLM Feature Circuits [1.4254279830438588]
We introduce a novel method for using transcoders to perform circuit analysis through sublayers.
We train transcoders on language models with 120M, 410M, and 1.4B parameters, and find them to perform at least on par with SAEs in terms of sparsity, faithfulness, and human-interpretability.
arXiv Detail & Related papers (2024-06-17T17:49:00Z) - CktGNN: Circuit Graph Neural Network for Electronic Design Automation [67.29634073660239]
This paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing.
We introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers.
Our work paves the way toward a learning-based open-sourced design automation for analog circuits.
arXiv Detail & Related papers (2023-08-31T02:20:25Z) - Towards Automated Circuit Discovery for Mechanistic Interpretability [7.605075513099429]
This paper systematizes the mechanistic interpretability process they followed.
By varying the dataset, metric, and units under investigation, researchers can understand the functionality of each component.
We propose several algorithms and reproduce previous interpretability results to validate them.
arXiv Detail & Related papers (2023-04-28T17:36:53Z) - Graph Signal Sampling for Inductive One-Bit Matrix Completion: a
Closed-form Solution [112.3443939502313]
We propose a unified graph signal sampling framework which enjoys the benefits of graph signal analysis and processing.
The key idea is to transform each user's ratings on the items to a function (signal) on the vertices of an item-item graph.
For the online setting, we develop a Bayesian extension, i.e., BGS-IMC which considers continuous random Gaussian noise in the graph Fourier domain.
arXiv Detail & Related papers (2023-02-08T08:17:43Z) - Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling
and Design [68.1682448368636]
We present a supervised pretraining approach to learn circuit representations that can be adapted to new unseen topologies or unseen prediction tasks.
To cope with the variable topological structure of different circuits we describe each circuit as a graph and use graph neural networks (GNNs) to learn node embeddings.
We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties.
arXiv Detail & Related papers (2022-03-29T21:18:47Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.