Related papers: EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification

EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification

URL: http://arxiv.org/abs/2502.06852v1
Date: Fri, 07 Feb 2025 16:04:57 GMT
Title: EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
Authors: Lin Zhang, Wenshuo Dong, Zhuoran Zhang, Shu Yang, Lijie Hu, Ninghao Liu, Pan Zhou, Di Wang,
Abstract summary: We propose Edge Patching with GradPath (EAP-GP) to address the saturation effect.<n>EAP-GP introduces an integration path, starting from the input and adaptively following the direction of the difference between the gradients of corrupted and clean inputs to avoid the saturated region.<n>We evaluate EAP-GP on 6 datasets using GPT-2 Small, GPT-2 Medium, and GPT-2 XL.
Score: 62.611812892924156
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding the internal mechanisms of transformer-based language models remains challenging. Mechanistic interpretability based on circuit discovery aims to reverse engineer neural networks by analyzing their internal processes at the level of computational subgraphs. In this paper, we revisit existing gradient-based circuit identification methods and find that their performance is either affected by the zero-gradient problem or saturation effects, where edge attribution scores become insensitive to input changes, resulting in noisy and unreliable attribution evaluations for circuit components. To address the saturation effect, we propose Edge Attribution Patching with GradPath (EAP-GP), EAP-GP introduces an integration path, starting from the input and adaptively following the direction of the difference between the gradients of corrupted and clean inputs to avoid the saturated region. This approach enhances attribution reliability and improves the faithfulness of circuit identification. We evaluate EAP-GP on 6 datasets using GPT-2 Small, GPT-2 Medium, and GPT-2 XL. Experimental results demonstrate that EAP-GP outperforms existing methods in circuit faithfulness, achieving improvements up to 17.7%. Comparisons with manually annotated ground-truth circuits demonstrate that EAP-GP achieves precision and recall comparable to or better than previous approaches, highlighting its effectiveness in identifying accurate circuits.

Related papers

Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss [0.7614628596146602]
This paper explores the application of a simple weighted loss function to Transformer-based models for multi-label emotion detection.<n>We evaluate BERT, RoBERTa, and BART on the BRIGHTER dataset, using evaluation metrics such as Micro F1, Macro F1, ROC-AUC, Accuracy, and Jaccard similarity coefficients.<n>The results demonstrate that the weighted loss function improves performance on high-frequency emotion classes but shows limited impact on minority classes.
arXiv Detail & Related papers (2025-07-15T14:53:33Z)
Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention [19.580332929984028]
Out-of-Distribution (OOD) detection is critical for safely deploying deep models in open-world environments.<n>We propose an inference-stage technique to short-circuit those feature coordinates that spurious gradients exploit.<n> Experiments on standard OOD benchmarks show our approach yields substantial improvements.
arXiv Detail & Related papers (2025-07-02T07:18:09Z)
Evaluating Uncertainty in Deep Gaussian Processes [0.0]
Deep Gaussian Processes (DGPs) and Deep Sigma Point Processes (DSPPs) extend GPs hierarchically, offering promising methods for uncertainty grounded in Bayesian principles. This work evaluates these models on regression (CASP dataset) and classification (ESR dataset) tasks, assessing predictive performance (MAE, Accu- racy), calibration using Negative Log-Likelihood (NLL) and Error (ECE) Compared to Deep Ensembles, which demonstrated superior robustness in both per-formance and calibration under the tested shifts, the GP-based methods showed vulnerabilities, exhibiting particular sensitivity in the observed metrics.
arXiv Detail & Related papers (2025-04-24T16:31:55Z)
Using the Path of Least Resistance to Explain Deep Networks [5.614094161229764]
Integrated Gradients (IG) is a widely used axiomatic path-based attribution method. We show that straight paths can lead to flawed attributions. We propose Geodesic Integrated Gradients (GIG) as an alternative.
arXiv Detail & Related papers (2025-02-17T18:29:24Z)
FGP: Feature-Gradient-Prune for Efficient Convolutional Layer Pruning [16.91552023598741]
This paper introduces a novel pruning method called Feature-Gradient Pruning (FGP) It integrates both feature-based and gradient-based information to more effectively evaluate the importance of channels across various target classes. Experiments conducted across multiple tasks and datasets show that FGP significantly reduces computational costs and minimizes accuracy loss.
arXiv Detail & Related papers (2024-11-19T08:42:15Z)
Spatial Adaptation Layer: Interpretable Domain Adaptation For Biosignal Sensor Array Applications [0.7499722271664147]
We propose the Spatial Adaptation Layer (SAL), which can be applied to any biosignal array model.<n>We also introduce learnable baseline normalization (LBN) to reduce baseline fluctuations.<n>Tested on two HD-sEMG gesture recognition datasets, SAL and LBN outperformed standard fine-tuning on regular arrays.
arXiv Detail & Related papers (2024-09-12T14:06:12Z)
Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning [14.639036250438517]
We introduce a comprehensive reformulation of the task known as Circuit Discovery, along with DiscoGP. DiscoGP is a novel and effective algorithm based on differentiable masking for discovering circuits.
arXiv Detail & Related papers (2024-07-04T09:42:25Z)
Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery.<n>Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods.<n>Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z)
Anchoring Path for Inductive Relation Prediction in Knowledge Graphs [69.81600732388182]
APST takes both APs and CPs as the inputs of a unified Sentence Transformer architecture. We evaluate APST on three public datasets and achieve state-of-the-art (SOTA) performance in 30 of 36 transductive, inductive, and few-shot experimental settings.
arXiv Detail & Related papers (2023-12-21T06:02:25Z)
A Kronecker product accelerated efficient sparse Gaussian Process (E-SGP) for flow emulation [2.563626165548781]
This paper introduces an efficient sparse Gaussian process (E-SGP) for the surrogate modelling of fluid mechanics. It is a further development of the approximated sparse GP algorithm, combining the concept of efficient GP (E-GP) and variational energy free sparse Gaussian process (VEF-SGP)
arXiv Detail & Related papers (2023-12-13T11:29:40Z)
Interactive Segmentation as Gaussian Process Classification [58.44673380545409]
Click-based interactive segmentation (IS) aims to extract the target objects under user interaction. Most of the current deep learning (DL)-based methods mainly follow the general pipelines of semantic segmentation. We propose to formulate the IS task as a Gaussian process (GP)-based pixel-wise binary classification model on each image.
arXiv Detail & Related papers (2023-02-28T14:01:01Z)
Incremental Ensemble Gaussian Processes [53.3291389385672]
We propose an incremental ensemble (IE-) GP framework, where an EGP meta-learner employs an it ensemble of GP learners, each having a unique kernel belonging to a prescribed kernel dictionary. With each GP expert leveraging the random feature-based approximation to perform online prediction and model update with it scalability, the EGP meta-learner capitalizes on data-adaptive weights to synthesize the per-expert predictions. The novel IE-GP is generalized to accommodate time-varying functions by modeling structured dynamics at the EGP meta-learner and within each GP learner.
arXiv Detail & Related papers (2021-10-13T15:11:25Z)
Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing its Gradient Estimator Bias [65.13042449121411]
In practice, training a network with the gradient estimates provided by EP does not scale to visual tasks harder than MNIST. We show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon. We apply these techniques to train an architecture with asymmetric forward and backward connections, yielding a 13.2% test error.
arXiv Detail & Related papers (2020-06-06T09:36:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.