Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity
- URL: http://arxiv.org/abs/2407.03779v2
- Date: Mon, 29 Sep 2025 10:07:47 GMT
- Title: Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity
- Authors: Lei Yu, Jingcheng Niu, Zining Zhu, Xi Chen, Gerald Penn,
- Abstract summary: We introduce DiscoGP, a framework for extracting self-contained modular units from neural language models (LMs)<n>Our framework identifies sheaves through a gradient-based pruning algorithm that operates on both of these in such a way that reduces the original LM to a sparse skeleton that preserves certain core capabilities.
- Score: 18.71252449465396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce DiscoGP, a novel framework for extracting self-contained modular units, or sheaves, within neural language models (LMs). Sheaves extend the concept of functional circuits, a unit widely explored in interpretability research, by considering not only subsets of edges in an LM's computation graph but also the model's weight parameters. Our framework identifies sheaves through a gradient-based pruning algorithm that operates on both of these in such a way that reduces the original LM to a sparse skeleton that preserves certain core capabilities. Experimental results demonstrate that, across a range of linguistic and reasoning tasks, DiscoGP extracts sheaves that preserve 93%-100% of a model's performance on the identified task while comprising only 1%-7% of the original weights and connections. Furthermore, our analysis reveals that, compared to previously identified LM circuits, the sheaves discovered by DiscoGP exhibit superior modularity and functional fidelity. Extending our method to the neuron level also unveils novel insights into the inner workings of LLMs
Related papers
- Provable In-Context Learning of Nonlinear Regression with Transformers [58.018629320233174]
In-context learning (ICL) is the ability to perform unseen tasks using task-specific prompts without updating parameters.<n>Recent research has actively explored the training dynamics behind ICL.<n>This paper investigates more complex nonlinear regression tasks, aiming to uncover how transformers acquire in-context learning capabilities.
arXiv Detail & Related papers (2025-07-28T00:09:28Z) - Who Does What in Deep Learning? Multidimensional Game-Theoretic Attribution of Function of Neural Units [0.9895793818721335]
Existing explainable-AI methods, such as SHAP, attribute importance to inputs, but cannot quantify the contributions of neural units.<n>Here we close that gap with Multiperturbation Shapley-value Analysis (MSA), a model-agnostic game-theoretic framework.
arXiv Detail & Related papers (2025-06-24T15:50:35Z) - Probing Neural Topology of Large Language Models [12.298921317333452]
We introduce graph probing, a method for uncovering the functional connectivity of large language models.<n>By probing models across diverse LLM families and scales, we discover a universal predictability of next-token prediction performance.<n>Strikingly, probing on topology outperforms probing on activation by up to 130.4%.
arXiv Detail & Related papers (2025-06-01T14:57:03Z) - Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits [58.63897489864948]
Reinforcement learning with outcome-based feedback faces a fundamental challenge.<n>How do we assign credit to the right actions?<n>This paper provides the first comprehensive analysis of this problem in online RL with general function approximation.
arXiv Detail & Related papers (2025-05-26T17:44:08Z) - Rethinking Circuit Completeness in Language Models: AND, OR, and ADDER Gates [31.608080868988825]
We introduce three types of logic gates: AND, OR, and ADDER gates, and decompose the circuit into combinations of these logical gates.<n>We propose a framework that combines noising-based and denoising-based interventions, which can be easily integrated into existing circuit discovery methods.
arXiv Detail & Related papers (2025-05-15T07:35:14Z) - R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference [77.47238561728459]
R-Sparse is a training-free activation sparsity approach capable of achieving high sparsity levels in advanced LLMs.
Experiments on Llama-2/3 and Mistral models across ten diverse tasks demonstrate that R-Sparse achieves comparable performance at 50% model-level sparsity.
arXiv Detail & Related papers (2025-04-28T03:30:32Z) - EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification [62.611812892924156]
We propose Edge Patching with GradPath (EAP-GP) to address the saturation effect.
EAP-GP introduces an integration path, starting from the input and adaptively following the direction of the difference between the gradients of corrupted and clean inputs to avoid the saturated region.
We evaluate EAP-GP on 6 datasets using GPT-2 Small, GPT-2 Medium, and GPT-2 XL.
arXiv Detail & Related papers (2025-02-07T16:04:57Z) - Position-aware Automatic Circuit Discovery [59.64762573617173]
We identify a gap in existing circuit discovery methods, treating model components as equally relevant across input positions.
We propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples.
Our approach enables fully automated discovery of position-sensitive circuits, yielding better trade-offs between circuit size and faithfulness compared to prior work.
arXiv Detail & Related papers (2025-02-07T00:18:20Z) - Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability [3.138731415322007]
We investigate the generality of the indirect object identification (IOI) circuit in GPT-2 small.
Our findings reveal that the circuit generalizes surprisingly well, reusing all of its components and mechanisms while only adding additional input edges.
arXiv Detail & Related papers (2024-11-25T05:32:34Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [56.00251589760559]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.<n>We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.<n>Experiments indicate that LM-GC surpasses existing state-of-the-art lossless compression methods.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications.
In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations.
We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z) - Transformer Circuit Faithfulness Metrics are not Robust [0.04260910081285213]
We measure circuit 'faithfulness' by ablating portions of the model's computation.
We conclude that existing circuit faithfulness scores reflect both the methodological choices of researchers as well as the actual components of the circuit.
The ultimate goal of mechanistic interpretability work is to understand neural networks, so we emphasize the need for more clarity in the precise claims being made about circuits.
arXiv Detail & Related papers (2024-07-11T17:59:00Z) - Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery.
Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods.
Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z) - CIRCUITSYNTH: Leveraging Large Language Models for Circuit Topology Synthesis [7.131266114437393]
We introduce CIRCUITSYNTH, a novel approach that harnesses LLMs to facilitate the automated synthesis of valid circuit topologies.
Our approach lays the foundation for future research aimed at enhancing circuit efficiency and specifying output voltage.
arXiv Detail & Related papers (2024-06-06T01:59:59Z) - Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models [55.19497659895122]
We introduce methods for discovering and applying sparse feature circuits.
These are causally implicatedworks of human-interpretable features for explaining language model behaviors.
arXiv Detail & Related papers (2024-03-28T17:56:07Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Data-free Weight Compress and Denoise for Large Language Models [96.68582094536032]
We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices.<n>We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
arXiv Detail & Related papers (2024-02-26T05:51:47Z) - Wasserstein proximal operators describe score-based generative models
and resolve memorization [12.321631823103894]
We first formulate SGMs in terms of the Wasserstein proximal operator (WPO)
We show that WPO describes the inductive bias of diffusion and score-based models.
We present an interpretable kernel-based model for the score function which dramatically improves the performance of SGMs.
arXiv Detail & Related papers (2024-02-09T03:33:13Z) - Attribution Patching Outperforms Automated Circuit Discovery [3.8695554579762814]
We show that a simple method based on attribution patching outperforms all existing methods.
We apply a linear approximation to activation patching to estimate the importance of each edge in the computational subgraph.
arXiv Detail & Related papers (2023-10-16T12:34:43Z) - Adaptive Planning Search Algorithm for Analog Circuit Verification [53.97809573610992]
We propose a machine learning (ML) approach, which uses less simulations.
We show that the proposed approach is able to provide OCCs closer to the specifications for all circuits.
arXiv Detail & Related papers (2023-06-23T12:57:46Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Quantum circuit debugging and sensitivity analysis via local inversions [62.997667081978825]
We present a technique that pinpoints the sections of a quantum circuit that affect the circuit output the most.
We demonstrate the practicality and efficacy of the proposed technique by applying it to example algorithmic circuits implemented on IBM quantum machines.
arXiv Detail & Related papers (2022-04-12T19:39:31Z) - Modeling Multi-Granularity Hierarchical Features for Relation Extraction [26.852869800344813]
We propose a novel method to extract multi-granularity features based solely on the original input sentences.
We show that effective structured features can be attained even without external knowledge.
arXiv Detail & Related papers (2022-04-09T09:44:05Z) - Locally Interpretable Model Agnostic Explanations using Gaussian
Processes [2.9189409618561966]
Local Interpretable Model-Agnostic Explanations (LIME) is a popular technique for explaining the prediction of a single instance.
We propose a Gaussian Process (GP) based variation of locally interpretable models.
We demonstrate that the proposed technique is able to generate faithful explanations using much fewer samples as compared to LIME.
arXiv Detail & Related papers (2021-08-16T05:49:01Z) - Learning algorithms from circuit lower bounds [0.0]
We revisit known constructions of efficient learning algorithms from various notions of constructive circuit lower bounds.
We prove that if it is possible to find efficiently, in a particular interactive way, errors of many p-size circuits attempting to solve hard problems, then p-size circuits can be PAC learned over the uniform distribution.
arXiv Detail & Related papers (2020-12-28T04:47:36Z) - Learning outside the Black-Box: The pursuit of interpretable models [78.32475359554395]
This paper proposes an algorithm that produces a continuous global interpretation of any given continuous black-box function.
Our interpretation represents a leap forward from the previous state of the art.
arXiv Detail & Related papers (2020-11-17T12:39:44Z) - Investigating the Scalability and Biological Plausibility of the
Activation Relaxation Algorithm [62.997667081978825]
Activation Relaxation (AR) algorithm provides a simple and robust approach for approximating the backpropagation of error algorithm.
We show that the algorithm can be further simplified and made more biologically plausible by introducing a learnable set of backwards weights.
We also investigate whether another biologically implausible assumption of the original AR algorithm -- the frozen feedforward pass -- can be relaxed without damaging performance.
arXiv Detail & Related papers (2020-10-13T08:02:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.