Related papers: Certified Circuits: Stability Guarantees for Mechanistic Circuits

Certified Circuits: Stability Guarantees for Mechanistic Circuits

URL: http://arxiv.org/abs/2602.22968v2
Date: Mon, 02 Mar 2026 13:21:23 GMT
Title: Certified Circuits: Stability Guarantees for Mechanistic Circuits
Authors: Alaa Anani, Tobias Lorenz, Bernt Schiele, Mario Fritz, Jonas Fischer,
Abstract summary: Certified Circuits provides provable stability guarantees for circuit discovery.<n>On ImageNet and OOD datasets, certified circuits achieve up to 91% higher accuracy.
Score: 80.30622018787835
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding how neural networks arrive at their predictions is essential for debugging, auditing, and deployment. Mechanistic interpretability pursues this goal by identifying circuits - minimal subnetworks responsible for specific behaviors. However, existing circuit discovery methods are brittle: circuits depend strongly on the chosen concept dataset and often fail to transfer out-of-distribution, raising doubts whether they capture concept or dataset-specific artifacts. We introduce Certified Circuits, which provide provable stability guarantees for circuit discovery. Our framework wraps any black-box discovery algorithm with randomized data subsampling to certify that circuit component inclusion decisions are invariant to bounded edit-distance perturbations of the concept dataset. Unstable neurons are abstained from, yielding circuits that are more compact and more accurate. On ImageNet and OOD datasets, certified circuits achieve up to 91% higher accuracy while using 45% fewer neurons, and remain reliable where baselines degrade. Certified Circuits puts circuit discovery on formal ground by producing mechanistic explanations that are provably stable and better aligned with the target concept. Code will be released soon!

Related papers

A Fine-Grained and Efficient Reliability Analysis Framework for Noisy Quantum Circuits [1.688452856995602]
We propose a fine-grained, scalable, and interpretable framework for efficient and accurate reliability evaluation of noisy quantum circuits.<n>Our approach performs a state-independent analysis to model how circuit reliability progressively degrades during execution.<n>Based on the NPC, we define Proxy Fidelity, a reliability metric that quantifies both qubit-level and circuit-level reliability.
arXiv Detail & Related papers (2026-02-20T16:58:40Z)
Formal Mechanistic Interpretability: Automated Circuit Discovery with Provable Guarantees [5.156069978876762]
We propose a suite of automated algorithms that yield circuits with provable guarantees.<n>We focus on three types of guarantees: *input domain robustness*, *robust patching*, and *minimality*.<n>We uncover a diverse set of novel theoretical connections among these three families of guarantees, with critical implications for the convergence of our algorithms.
arXiv Detail & Related papers (2026-02-18T19:41:01Z)
Explaining the Explainer: Understanding the Inner Workings of Transformer-based Symbolic Regression Models [3.7957452405531265]
We introduce PATCHES, an evolutionary circuit discovery algorithm that identifies compact and correct circuits for symbolic regression.<n>Using PATCHES, we isolate 28 circuits, providing the first circuit-level characterisation of an SR transformer.
arXiv Detail & Related papers (2026-02-03T13:27:10Z)
Discovering Transformer Circuits via a Hybrid Attribution and Pruning Framework [4.336808542533343]
This research proposes a hybrid attribution and pruning framework that uses attribution patching to identify a high-potential subgraph.<n>We show that HAP is 46% faster than baseline algorithms without sacrificing circuit faithfulness.
arXiv Detail & Related papers (2025-09-28T18:34:43Z)
Position-aware Automatic Circuit Discovery [59.64762573617173]
We identify a gap in existing circuit discovery methods, treating model components as equally relevant across input positions.<n>We propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples.<n>Our approach enables fully automated discovery of position-sensitive circuits, yielding better trade-offs between circuit size and faithfulness compared to prior work.
arXiv Detail & Related papers (2025-02-07T00:18:20Z)
Transformer Circuit Faithfulness Metrics are not Robust [0.04260910081285213]
We measure circuit 'faithfulness' by ablating portions of the model's computation. We conclude that existing circuit faithfulness scores reflect both the methodological choices of researchers as well as the actual components of the circuit. The ultimate goal of mechanistic interpretability work is to understand neural networks, so we emphasize the need for more clarity in the precise claims being made about circuits.
arXiv Detail & Related papers (2024-07-11T17:59:00Z)
CktGNN: Circuit Graph Neural Network for Electronic Design Automation [67.29634073660239]
This paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing. We introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers. Our work paves the way toward a learning-based open-sourced design automation for analog circuits.
arXiv Detail & Related papers (2023-08-31T02:20:25Z)
Adaptive Planning Search Algorithm for Analog Circuit Verification [53.97809573610992]
We propose a machine learning (ML) approach, which uses less simulations. We show that the proposed approach is able to provide OCCs closer to the specifications for all circuits.
arXiv Detail & Related papers (2023-06-23T12:57:46Z)
Transfer Learning for Fault Diagnosis of Transmission Lines [55.971052290285485]
A novel transfer learning framework based on a pre-trained LeNet-5 convolutional neural network is proposed. It is able to diagnose faults for different transmission line lengths and impedances by transferring the knowledge from a source neural network to predict a dissimilar target dataset.
arXiv Detail & Related papers (2022-01-20T06:36:35Z)
Hardware-Encoding Grid States in a Non-Reciprocal Superconducting Circuit [62.997667081978825]
We present a circuit design composed of a non-reciprocal device and Josephson junctions whose ground space is doubly degenerate and the ground states are approximate codewords of the Gottesman-Kitaev-Preskill (GKP) code. We find that the circuit is naturally protected against the common noise channels in superconducting circuits, such as charge and flux noise, implying that it can be used for passive quantum error correction.
arXiv Detail & Related papers (2020-02-18T16:45:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.