Related papers: Adversarial Circuit Evaluation

Adversarial Circuit Evaluation

URL: http://arxiv.org/abs/2407.15166v1
Date: Sun, 21 Jul 2024 13:43:44 GMT
Title: Adversarial Circuit Evaluation
Authors: Niels uit de Bos, Adrià Garriga-Alonso,
Abstract summary: We evaluate three circuits found in the literature (IOI, greater-than, and docstring) in an adversarial manner. We measure the KL divergence between the full model's output and the circuit's output, calculated through resample ablation, and we analyze the worst-performing inputs.
Score: 1.1893676124374688
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Circuits are supposed to accurately describe how a neural network performs a specific task, but do they really? We evaluate three circuits found in the literature (IOI, greater-than, and docstring) in an adversarial manner, considering inputs where the circuit's behavior maximally diverges from the full model. Concretely, we measure the KL divergence between the full model's output and the circuit's output, calculated through resample ablation, and we analyze the worst-performing inputs. Our results show that the circuits for the IOI and docstring tasks fail to behave similarly to the full model even on completely benign inputs from the original task, indicating that more robust circuits are needed for safety-critical applications.

Related papers

Position-aware Automatic Circuit Discovery [59.64762573617173]
We identify a gap in existing circuit discovery methods, treating model components as equally relevant across input positions. We propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples. Our approach enables fully automated discovery of position-sensitive circuits, yielding better trade-offs between circuit size and faithfulness compared to prior work.
arXiv Detail & Related papers (2025-02-07T00:18:20Z)
CktGen: Specification-Conditioned Analog Circuit Generation [28.780603785886242]
We introduce a task that directly generates analog circuits based on specified specifications. Specifically, we propose CktGen, a simple yet effective variational autoencoder (VAE) model. We conduct comprehensive experiments on the Open Circuit Benchmark (OCB) and introduce new evaluation metrics for cross-model consistency.
arXiv Detail & Related papers (2024-10-01T18:35:44Z)
Transformer Circuit Faithfulness Metrics are not Robust [0.04260910081285213]
We measure circuit 'faithfulness' by ablating portions of the model's computation. We conclude that existing circuit faithfulness scores reflect both the methodological choices of researchers as well as the actual components of the circuit. The ultimate goal of mechanistic interpretability work is to understand neural networks, so we emphasize the need for more clarity in the precise claims being made about circuits.
arXiv Detail & Related papers (2024-07-11T17:59:00Z)
Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition [10.13822875330178]
We introduce contextual decomposition for transformers (CD-T) to build interpretable circuits in large language models. CD-T can produce circuits of arbitrary level of abstraction, and is the first able to produce circuits as fine-grained as attention heads. We show CD-T circuits are able to perfectly replicate original models' behavior using fewer nodes than the baselines for all tasks.
arXiv Detail & Related papers (2024-07-01T01:12:20Z)
Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery. Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods. Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z)
Automatically Identifying Local and Global Circuits with Linear Computation Graphs [45.760716193942685]
We introduce our circuit discovery pipeline with Sparse Autoencoders (SAEs) and a variant called Transcoders. Our methods do not require linear approximation to compute the causal effect of each node. We analyze three kinds of circuits in GPT-2 Small: bracket, induction, and Indirect Object Identification circuits.
arXiv Detail & Related papers (2024-05-22T17:50:04Z)
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms [35.514624827207136]
Edge attribution patching (EAP), gradient-based approximation to interventions, has emerged as a scalable but imperfect solution to this problem. We introduce a new method - EAP with integrated gradients (EAP-IG) - that aims to better maintain a core property of circuits: faithfulness. Our experiments demonstrate that circuits found using EAP are less faithful than those found using EAP-IG, even though both have high node overlap with circuits found previously using causal interventions.
arXiv Detail & Related papers (2024-03-26T15:44:58Z)
CktGNN: Circuit Graph Neural Network for Electronic Design Automation [67.29634073660239]
This paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing. We introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers. Our work paves the way toward a learning-based open-sourced design automation for analog circuits.
arXiv Detail & Related papers (2023-08-31T02:20:25Z)
Adaptive Planning Search Algorithm for Analog Circuit Verification [53.97809573610992]
We propose a machine learning (ML) approach, which uses less simulations. We show that the proposed approach is able to provide OCCs closer to the specifications for all circuits.
arXiv Detail & Related papers (2023-06-23T12:57:46Z)
Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling and Design [68.1682448368636]
We present a supervised pretraining approach to learn circuit representations that can be adapted to new unseen topologies or unseen prediction tasks. To cope with the variable topological structure of different circuits we describe each circuit as a graph and use graph neural networks (GNNs) to learn node embeddings. We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties.
arXiv Detail & Related papers (2022-03-29T21:18:47Z)
On the realistic worst case analysis of quantum arithmetic circuits [69.43216268165402]
We show that commonly held intuitions when designing quantum circuits can be misleading. We show that reducing the T-count can increase the total depth. We illustrate our method on addition and multiplication circuits using ripple-carry.
arXiv Detail & Related papers (2021-01-12T21:36:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.