PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization
- URL: http://arxiv.org/abs/2510.23264v1
- Date: Mon, 27 Oct 2025 12:24:14 GMT
- Title: PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization
- Authors: Xinhai Wang, Shu Yang, Liangyu Wang, Lin Zhang, Huanyi Xie, Lijie Hu, Di Wang,
- Abstract summary: Automated Circuit Discovery (ACDC) has emerged as a pivotal methodology in circuit discovery.<n>But its application to large language models is severely limited by computational inefficiency and prohibitively high memory requirements.<n>Our proposed method for accelerating automated circuit discovery, Per Attention Head Quantization (PAHQ), takes a fundamentally different approach by optimizing the efficiency of each individual patching operation.
- Score: 17.316927027489506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Circuit discovery, which involves identifying sparse and task-relevant subnetworks in pre-trained language models, is a cornerstone of mechanistic interpretability. Automated Circuit Discovery (ACDC) has emerged as a pivotal methodology in circuit discovery, but its application to large language models is severely limited by computational inefficiency and prohibitively high memory requirements. Although several accelerated approaches have been proposed, they primarily rely on linear approximations to ACDC, which significantly compromises analytical faithfulness. Our proposed method for accelerating automated circuit discovery, Per Attention Head Quantization (PAHQ), takes a fundamentally different approach by optimizing the efficiency of each individual patching operation. PAHQ leverages a fundamental alignment between activation patching and mixed-precision quantization (MPQ): interpretability analysis through patching essentially performs targeted ablation studies. Therefore, we can maintain high precision exclusively for investigated components while safely reducing precision elsewhere in the network. PAHQ-accelerated ACDC reduces runtime by up to 80\% and memory consumption by up to 30\% compared to unaccelerated ACDC while maintaining faithfulness. Importantly, our method readily integrates with existing edge-based circuit discovery techniques by modifying the attention computation mechanism. This training-free approach provides a practical and novel pathway for accelerating mechanistic interpretability methods. Our code is available at https://github.com/626619403/PAHQ.
Related papers
- Training-free Context-adaptive Attention for Efficient Long Context Modeling [57.703159205740185]
Training-free Context-adaptive Attention (TCA-Attention) is a training-free sparse attention mechanism that selectively attends to only the informative tokens for efficient long-context inference.<n>TCA-Attention achieves a 2.8$times$ speedup and reduces KV cache by 61% at 128K context length while maintaining performance comparable to full attention.
arXiv Detail & Related papers (2025-12-10T01:54:57Z) - Quantum compilation framework for data loading [0.9974960169271107]
Efficient encoding of classical data into quantum circuits is a critical challenge that directly impacts the scalability of quantum algorithms.<n>We present an automated compilation framework for resource-aware quantum data loading tailored to a given input vector and target error tolerance.<n>We demonstrate the effectiveness of our framework across several applications, where it consistently uncovers non-obvious, resource-efficient strategies.
arXiv Detail & Related papers (2025-12-04T19:00:01Z) - Discovering Transformer Circuits via a Hybrid Attribution and Pruning Framework [4.336808542533343]
This research proposes a hybrid attribution and pruning framework that uses attribution patching to identify a high-potential subgraph.<n>We show that HAP is 46% faster than baseline algorithms without sacrificing circuit faithfulness.
arXiv Detail & Related papers (2025-09-28T18:34:43Z) - EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z) - Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery.<n>Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods.<n>Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z) - Attribution Patching Outperforms Automated Circuit Discovery [3.8695554579762814]
We show that a simple method based on attribution patching outperforms all existing methods.
We apply a linear approximation to activation patching to estimate the importance of each edge in the computational subgraph.
arXiv Detail & Related papers (2023-10-16T12:34:43Z) - An Accelerated Doubly Stochastic Gradient Method with Faster Explicit
Model Identification [97.28167655721766]
We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems.
We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
arXiv Detail & Related papers (2022-08-11T22:27:22Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - Learnable Faster Kernel-PCA for Nonlinear Fault Detection: Deep
Autoencoder-Based Realization [7.057302509355857]
Kernel principal component analysis (KPCA) is a well-recognized nonlinear dimensionality reduction method.
In this work, a learnable faster realization of the conventional KPCA is proposed.
The proposed DAE-PCA method is proved to be equivalent to KPCA but has more advantage in terms of automatic searching of the most suitable nonlinear high-dimensional space according to the inputs.
arXiv Detail & Related papers (2021-12-08T09:41:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.