Hypothesis Testing the Circuit Hypothesis in LLMs
- URL: http://arxiv.org/abs/2410.13032v1
- Date: Wed, 16 Oct 2024 20:45:29 GMT
- Title: Hypothesis Testing the Circuit Hypothesis in LLMs
- Authors: Claudia Shi, Nicolas Beltran-Velez, Achille Nazaret, Carolina Zheng, AdriĆ Garriga-Alonso, Andrew Jesson, Maggie Makar, David M. Blei,
- Abstract summary: We formalize a set of criteria that a circuit is hypothesized to meet and develop a suite of hypothesis tests to evaluate how well circuits satisfy them.
The criteria focus on the extent to which the LLM's behavior is preserved, the degree of localization of this behavior, and whether the circuit is minimal.
We find that synthetic circuits -- circuits that are hard-coded in the model -- align with the idealized properties.
- Score: 23.71068328527293
- License:
- Abstract: Large language models (LLMs) demonstrate surprising capabilities, but we do not understand how they are implemented. One hypothesis suggests that these capabilities are primarily executed by small subnetworks within the LLM, known as circuits. But how can we evaluate this hypothesis? In this paper, we formalize a set of criteria that a circuit is hypothesized to meet and develop a suite of hypothesis tests to evaluate how well circuits satisfy them. The criteria focus on the extent to which the LLM's behavior is preserved, the degree of localization of this behavior, and whether the circuit is minimal. We apply these tests to six circuits described in the research literature. We find that synthetic circuits -- circuits that are hard-coded in the model -- align with the idealized properties. Circuits discovered in Transformer models satisfy the criteria to varying degrees. To facilitate future empirical studies of circuits, we created the \textit{circuitry} package, a wrapper around the \textit{TransformerLens} library, which abstracts away lower-level manipulations of hooks and activations. The software is available at \url{https://github.com/blei-lab/circuitry}.
Related papers
- Demystifying Singular Defects in Large Language Models [61.98878352956125]
In large language models (LLMs), the underlying causes of high-norm tokens remain largely unexplored.
We provide both theoretical insights and empirical validation across a range of recent models.
We showcase two practical applications of these findings: the improvement of quantization schemes and the design of LLM signatures.
arXiv Detail & Related papers (2025-02-10T20:09:16Z) - Position-aware Automatic Circuit Discovery [59.64762573617173]
We identify a gap in existing circuit discovery methods, treating model components as equally relevant across input positions.
We propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples.
Our approach enables fully automated discovery of position-sensitive circuits, yielding better trade-offs between circuit size and faithfulness compared to prior work.
arXiv Detail & Related papers (2025-02-07T00:18:20Z) - Enriched Categories for Parameterized Circuit Semantics [0.0]
We show that the parameterized semantics studied in prior work can be understood through enrichment and internal constructions.
We suggest how these semantics could be extended to classical circuit analysis and parameterized equivalence checking.
arXiv Detail & Related papers (2025-01-21T20:17:51Z) - Adversarial Circuit Evaluation [1.1893676124374688]
We evaluate three circuits found in the literature (IOI, greater-than, and docstring) in an adversarial manner.
We measure the KL divergence between the full model's output and the circuit's output, calculated through resample ablation, and we analyze the worst-performing inputs.
arXiv Detail & Related papers (2024-07-21T13:43:44Z) - Transformer Circuit Faithfulness Metrics are not Robust [0.04260910081285213]
We measure circuit 'faithfulness' by ablating portions of the model's computation.
We conclude that existing circuit faithfulness scores reflect both the methodological choices of researchers as well as the actual components of the circuit.
The ultimate goal of mechanistic interpretability work is to understand neural networks, so we emphasize the need for more clarity in the precise claims being made about circuits.
arXiv Detail & Related papers (2024-07-11T17:59:00Z) - Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery.
Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods.
Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z) - Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms [35.514624827207136]
Edge attribution patching (EAP), gradient-based approximation to interventions, has emerged as a scalable but imperfect solution to this problem.
We introduce a new method - EAP with integrated gradients (EAP-IG) - that aims to better maintain a core property of circuits: faithfulness.
Our experiments demonstrate that circuits found using EAP are less faithful than those found using EAP-IG, even though both have high node overlap with circuits found previously using causal interventions.
arXiv Detail & Related papers (2024-03-26T15:44:58Z) - Uncovering Intermediate Variables in Transformers using Circuit Probing [28.81226181942661]
We propose a new analysis technique - circuit probing - that automatically uncovers low-level circuits that compute hypothesized intermediate variables.
We apply this method to models trained on simple arithmetic tasks, demonstrating its effectiveness at (1) deciphering the algorithms that models have learned, (2) revealing modular structure within a model, and (3) tracking the development of circuits over training.
arXiv Detail & Related papers (2023-11-07T21:27:17Z) - Adaptive Planning Search Algorithm for Analog Circuit Verification [53.97809573610992]
We propose a machine learning (ML) approach, which uses less simulations.
We show that the proposed approach is able to provide OCCs closer to the specifications for all circuits.
arXiv Detail & Related papers (2023-06-23T12:57:46Z) - Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling
and Design [68.1682448368636]
We present a supervised pretraining approach to learn circuit representations that can be adapted to new unseen topologies or unseen prediction tasks.
To cope with the variable topological structure of different circuits we describe each circuit as a graph and use graph neural networks (GNNs) to learn node embeddings.
We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties.
arXiv Detail & Related papers (2022-03-29T21:18:47Z) - Machine Learning Optimization of Quantum Circuit Layouts [63.55764634492974]
We introduce a quantum circuit mapping, QXX, and its machine learning version, QXX-MLP.
The latter infers automatically the optimal QXX parameter values such that the layed out circuit has a reduced depth.
We present empiric evidence for the feasibility of learning the layout method using approximation.
arXiv Detail & Related papers (2020-07-29T05:26:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.