Related papers: Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition

Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition

URL: http://arxiv.org/abs/2407.00886v2
Date: Fri, 11 Oct 2024 19:12:22 GMT
Title: Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition
Authors: Aliyah R. Hsu, Georgia Zhou, Yeshwanth Cherapanamjeri, Yaxuan Huang, Anobel Y. Odisho, Peter R. Carroll, Bin Yu,
Abstract summary: We introduce contextual decomposition for transformers (CD-T) to build interpretable circuits in large language models. CD-T can produce circuits of arbitrary level of abstraction, and is the first able to produce circuits as fine-grained as attention heads. We show CD-T circuits are able to perfectly replicate original models' behavior using fewer nodes than the baselines for all tasks.
Score: 10.13822875330178
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated mechanistic interpretation research has attracted great interest due to its potential to scale explanations of neural network internals to large models. Existing automated circuit discovery work relies on activation patching or its approximations to identify subgraphs in models for specific tasks (circuits). They often suffer from slow runtime, approximation errors, and specific requirements of metrics, such as non-zero gradients. In this work, we introduce contextual decomposition for transformers (CD-T) to build interpretable circuits in large language models. CD-T can produce circuits of arbitrary level of abstraction, and is the first able to produce circuits as fine-grained as attention heads at specific sequence positions efficiently. CD-T consists of a set of mathematical equations to isolate contribution of model features. Through recursively computing contribution of all nodes in a computational graph of a model using CD-T followed by pruning, we are able to reduce circuit discovery runtime from hours to seconds compared to state-of-the-art baselines. On three standard circuit evaluation datasets (indirect object identification, greater-than comparisons, and docstring completion), we demonstrate that CD-T outperforms ACDC and EAP by better recovering the manual circuits with an average of 97% ROC AUC under low runtimes. In addition, we provide evidence that faithfulness of CD-T circuits is not due to random chance by showing our circuits are 80% more faithful than random circuits of up to 60% of the original model size. Finally, we show CD-T circuits are able to perfectly replicate original models' behavior (faithfulness $ = 1$) using fewer nodes than the baselines for all tasks. Our results underscore the great promise of CD-T for efficient automated mechanistic interpretability, paving the way for new insights into the workings of large language models.

Related papers

Architect of the Bits World: Masked Autoregressive Modeling for Circuit Generation Guided by Truth Table [5.300504429005315]
We propose a novel approach integrating conditional generative models with differentiable architecture search (DAS) for circuit generation. Our approach first introduces CircuitVQ, a circuit tokenizer trained based on our Circuit AutoEncoder. We then develop CircuitAR, a masked autoregressive model leveraging CircuitVQ as the tokenizer.
arXiv Detail & Related papers (2025-02-18T11:13:03Z)
Position-aware Automatic Circuit Discovery [59.64762573617173]
We identify a gap in existing circuit discovery methods, treating model components as equally relevant across input positions. We propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples. Our approach enables fully automated discovery of position-sensitive circuits, yielding better trade-offs between circuit size and faithfulness compared to prior work.
arXiv Detail & Related papers (2025-02-07T00:18:20Z)
Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery. Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods. Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z)
Automatically Identifying Local and Global Circuits with Linear Computation Graphs [45.760716193942685]
We introduce our circuit discovery pipeline with Sparse Autoencoders (SAEs) and a variant called Transcoders. Our methods do not require linear approximation to compute the causal effect of each node. We analyze three kinds of circuits in GPT-2 Small: bracket, induction, and Indirect Object Identification circuits.
arXiv Detail & Related papers (2024-05-22T17:50:04Z)
CktGNN: Circuit Graph Neural Network for Electronic Design Automation [67.29634073660239]
This paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing. We introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers. Our work paves the way toward a learning-based open-sourced design automation for analog circuits.
arXiv Detail & Related papers (2023-08-31T02:20:25Z)
FuNToM: Functional Modeling of RF Circuits Using a Neural Network Assisted Two-Port Analysis Method [0.40598496563941905]
We present FuNToM, a functional modeling method for RF circuits. FuNToM leverages the two-port analysis method for modeling multiple topologies using a single main dataset and multiple small datasets. Our results show that for multiple RF circuits, in comparison to the state-of-the-art works, the required training data is reduced by 2.8x - 10.9x.
arXiv Detail & Related papers (2023-08-03T21:08:16Z)
Adaptive Planning Search Algorithm for Analog Circuit Verification [53.97809573610992]
We propose a machine learning (ML) approach, which uses less simulations. We show that the proposed approach is able to provide OCCs closer to the specifications for all circuits.
arXiv Detail & Related papers (2023-06-23T12:57:46Z)
Towards Automated Circuit Discovery for Mechanistic Interpretability [7.605075513099429]
This paper systematizes the mechanistic interpretability process they followed. By varying the dataset, metric, and units under investigation, researchers can understand the functionality of each component. We propose several algorithms and reproduce previous interpretability results to validate them.
arXiv Detail & Related papers (2023-04-28T17:36:53Z)
Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling and Design [68.1682448368636]
We present a supervised pretraining approach to learn circuit representations that can be adapted to new unseen topologies or unseen prediction tasks. To cope with the variable topological structure of different circuits we describe each circuit as a graph and use graph neural networks (GNNs) to learn node embeddings. We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties.
arXiv Detail & Related papers (2022-03-29T21:18:47Z)
On the realistic worst case analysis of quantum arithmetic circuits [69.43216268165402]
We show that commonly held intuitions when designing quantum circuits can be misleading. We show that reducing the T-count can increase the total depth. We illustrate our method on addition and multiplication circuits using ripple-carry.
arXiv Detail & Related papers (2021-01-12T21:36:16Z)
DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search [55.164053971213576]
convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead. Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure. Existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space.
arXiv Detail & Related papers (2020-11-04T07:43:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.