EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
- URL: http://arxiv.org/abs/2502.06852v1
- Date: Fri, 07 Feb 2025 16:04:57 GMT
- Title: EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
- Authors: Lin Zhang, Wenshuo Dong, Zhuoran Zhang, Shu Yang, Lijie Hu, Ninghao Liu, Pan Zhou, Di Wang,
- Abstract summary: We propose Edge Patching with GradPath (EAP-GP) to address the saturation effect.
EAP-GP introduces an integration path, starting from the input and adaptively following the direction of the difference between the gradients of corrupted and clean inputs to avoid the saturated region.
We evaluate EAP-GP on 6 datasets using GPT-2 Small, GPT-2 Medium, and GPT-2 XL.
- Score: 62.611812892924156
- License:
- Abstract: Understanding the internal mechanisms of transformer-based language models remains challenging. Mechanistic interpretability based on circuit discovery aims to reverse engineer neural networks by analyzing their internal processes at the level of computational subgraphs. In this paper, we revisit existing gradient-based circuit identification methods and find that their performance is either affected by the zero-gradient problem or saturation effects, where edge attribution scores become insensitive to input changes, resulting in noisy and unreliable attribution evaluations for circuit components. To address the saturation effect, we propose Edge Attribution Patching with GradPath (EAP-GP), EAP-GP introduces an integration path, starting from the input and adaptively following the direction of the difference between the gradients of corrupted and clean inputs to avoid the saturated region. This approach enhances attribution reliability and improves the faithfulness of circuit identification. We evaluate EAP-GP on 6 datasets using GPT-2 Small, GPT-2 Medium, and GPT-2 XL. Experimental results demonstrate that EAP-GP outperforms existing methods in circuit faithfulness, achieving improvements up to 17.7%. Comparisons with manually annotated ground-truth circuits demonstrate that EAP-GP achieves precision and recall comparable to or better than previous approaches, highlighting its effectiveness in identifying accurate circuits.
Related papers
- Using the Path of Least Resistance to Explain Deep Networks [5.614094161229764]
Integrated Gradients (IG) is a widely used axiomatic path-based attribution method.
We show that straight paths can lead to flawed attributions.
We propose Geodesic Integrated Gradients (GIG) as an alternative.
arXiv Detail & Related papers (2025-02-17T18:29:24Z) - FGP: Feature-Gradient-Prune for Efficient Convolutional Layer Pruning [16.91552023598741]
This paper introduces a novel pruning method called Feature-Gradient Pruning (FGP)
It integrates both feature-based and gradient-based information to more effectively evaluate the importance of channels across various target classes.
Experiments conducted across multiple tasks and datasets show that FGP significantly reduces computational costs and minimizes accuracy loss.
arXiv Detail & Related papers (2024-11-19T08:42:15Z) - Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning [14.639036250438517]
We introduce a comprehensive reformulation of the task known as Circuit Discovery, along with DiscoGP.
DiscoGP is a novel and effective algorithm based on differentiable masking for discovering circuits.
arXiv Detail & Related papers (2024-07-04T09:42:25Z) - Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery.
Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods.
Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z) - Anchoring Path for Inductive Relation Prediction in Knowledge Graphs [69.81600732388182]
APST takes both APs and CPs as the inputs of a unified Sentence Transformer architecture.
We evaluate APST on three public datasets and achieve state-of-the-art (SOTA) performance in 30 of 36 transductive, inductive, and few-shot experimental settings.
arXiv Detail & Related papers (2023-12-21T06:02:25Z) - A Kronecker product accelerated efficient sparse Gaussian Process
(E-SGP) for flow emulation [2.563626165548781]
This paper introduces an efficient sparse Gaussian process (E-SGP) for the surrogate modelling of fluid mechanics.
It is a further development of the approximated sparse GP algorithm, combining the concept of efficient GP (E-GP) and variational energy free sparse Gaussian process (VEF-SGP)
arXiv Detail & Related papers (2023-12-13T11:29:40Z) - Interactive Segmentation as Gaussian Process Classification [58.44673380545409]
Click-based interactive segmentation (IS) aims to extract the target objects under user interaction.
Most of the current deep learning (DL)-based methods mainly follow the general pipelines of semantic segmentation.
We propose to formulate the IS task as a Gaussian process (GP)-based pixel-wise binary classification model on each image.
arXiv Detail & Related papers (2023-02-28T14:01:01Z) - Incremental Ensemble Gaussian Processes [53.3291389385672]
We propose an incremental ensemble (IE-) GP framework, where an EGP meta-learner employs an it ensemble of GP learners, each having a unique kernel belonging to a prescribed kernel dictionary.
With each GP expert leveraging the random feature-based approximation to perform online prediction and model update with it scalability, the EGP meta-learner capitalizes on data-adaptive weights to synthesize the per-expert predictions.
The novel IE-GP is generalized to accommodate time-varying functions by modeling structured dynamics at the EGP meta-learner and within each GP learner.
arXiv Detail & Related papers (2021-10-13T15:11:25Z) - Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing
its Gradient Estimator Bias [65.13042449121411]
In practice, training a network with the gradient estimates provided by EP does not scale to visual tasks harder than MNIST.
We show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon.
We apply these techniques to train an architecture with asymmetric forward and backward connections, yielding a 13.2% test error.
arXiv Detail & Related papers (2020-06-06T09:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.