Hypothesis Testing the Circuit Hypothesis in LLMs
- URL: http://arxiv.org/abs/2410.13032v1
- Date: Wed, 16 Oct 2024 20:45:29 GMT
- Title: Hypothesis Testing the Circuit Hypothesis in LLMs
- Authors: Claudia Shi, Nicolas Beltran-Velez, Achille Nazaret, Carolina Zheng, AdriĆ Garriga-Alonso, Andrew Jesson, Maggie Makar, David M. Blei,
- Abstract summary: We formalize a set of criteria that a circuit is hypothesized to meet and develop a suite of hypothesis tests to evaluate how well circuits satisfy them.
The criteria focus on the extent to which the LLM's behavior is preserved, the degree of localization of this behavior, and whether the circuit is minimal.
We find that synthetic circuits -- circuits that are hard-coded in the model -- align with the idealized properties.
- Score: 23.71068328527293
- License:
- Abstract: Large language models (LLMs) demonstrate surprising capabilities, but we do not understand how they are implemented. One hypothesis suggests that these capabilities are primarily executed by small subnetworks within the LLM, known as circuits. But how can we evaluate this hypothesis? In this paper, we formalize a set of criteria that a circuit is hypothesized to meet and develop a suite of hypothesis tests to evaluate how well circuits satisfy them. The criteria focus on the extent to which the LLM's behavior is preserved, the degree of localization of this behavior, and whether the circuit is minimal. We apply these tests to six circuits described in the research literature. We find that synthetic circuits -- circuits that are hard-coded in the model -- align with the idealized properties. Circuits discovered in Transformer models satisfy the criteria to varying degrees. To facilitate future empirical studies of circuits, we created the \textit{circuitry} package, a wrapper around the \textit{TransformerLens} library, which abstracts away lower-level manipulations of hooks and activations. The software is available at \url{https://github.com/blei-lab/circuitry}.
Related papers
- Adversarial Circuit Evaluation [1.1893676124374688]
We evaluate three circuits found in the literature (IOI, greater-than, and docstring) in an adversarial manner.
We measure the KL divergence between the full model's output and the circuit's output, calculated through resample ablation, and we analyze the worst-performing inputs.
arXiv Detail & Related papers (2024-07-21T13:43:44Z) - Transformer Circuit Faithfulness Metrics are not Robust [0.04260910081285213]
We measure circuit 'faithfulness' by ablating portions of the model's computation.
We conclude that existing circuit faithfulness scores reflect both the methodological choices of researchers as well as the actual components of the circuit.
The ultimate goal of mechanistic interpretability work is to understand neural networks, so we emphasize the need for more clarity in the precise claims being made about circuits.
arXiv Detail & Related papers (2024-07-11T17:59:00Z) - What Do the Circuits Mean? A Knowledge Edit View [18.022428746019582]
We use diverse text classification datasets to extract circuits in the GPT2-XL model.
Our findings indicate that these circuits contain entity knowledge but resist new knowledge more than complementary circuits during knowledge editing.
We find that up to 60% of the circuits consist of layer modules rather than attention or normalization modules.
arXiv Detail & Related papers (2024-06-25T03:09:53Z) - Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery.
Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods.
Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z) - Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms [35.514624827207136]
Edge attribution patching (EAP), gradient-based approximation to interventions, has emerged as a scalable but imperfect solution to this problem.
We introduce a new method - EAP with integrated gradients (EAP-IG) - that aims to better maintain a core property of circuits: faithfulness.
Our experiments demonstrate that circuits found using EAP are less faithful than those found using EAP-IG, even though both have high node overlap with circuits found previously using causal interventions.
arXiv Detail & Related papers (2024-03-26T15:44:58Z) - Uncovering Intermediate Variables in Transformers using Circuit Probing [32.382094867951224]
We propose a new analysis technique -- circuit probing -- that automatically uncovers low-level circuits that compute hypothesized intermediate variables.
We apply this method to models trained on simple arithmetic tasks, demonstrating its effectiveness at (1) deciphering the algorithms that models have learned, (2) revealing modular structure within a model, and (3) tracking the development of circuits over training.
arXiv Detail & Related papers (2023-11-07T21:27:17Z) - Adaptive Planning Search Algorithm for Analog Circuit Verification [53.97809573610992]
We propose a machine learning (ML) approach, which uses less simulations.
We show that the proposed approach is able to provide OCCs closer to the specifications for all circuits.
arXiv Detail & Related papers (2023-06-23T12:57:46Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling
and Design [68.1682448368636]
We present a supervised pretraining approach to learn circuit representations that can be adapted to new unseen topologies or unseen prediction tasks.
To cope with the variable topological structure of different circuits we describe each circuit as a graph and use graph neural networks (GNNs) to learn node embeddings.
We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties.
arXiv Detail & Related papers (2022-03-29T21:18:47Z) - On the realistic worst case analysis of quantum arithmetic circuits [69.43216268165402]
We show that commonly held intuitions when designing quantum circuits can be misleading.
We show that reducing the T-count can increase the total depth.
We illustrate our method on addition and multiplication circuits using ripple-carry.
arXiv Detail & Related papers (2021-01-12T21:36:16Z) - Machine Learning Optimization of Quantum Circuit Layouts [63.55764634492974]
We introduce a quantum circuit mapping, QXX, and its machine learning version, QXX-MLP.
The latter infers automatically the optimal QXX parameter values such that the layed out circuit has a reduced depth.
We present empiric evidence for the feasibility of learning the layout method using approximation.
arXiv Detail & Related papers (2020-07-29T05:26:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.