Related papers: Hierarchical Sparse Circuit Extraction from Billion-Parameter Language Models through Scalable Attribution Graph Decomposition

Hierarchical Sparse Circuit Extraction from Billion-Parameter Language Models through Scalable Attribution Graph Decomposition

URL: http://arxiv.org/abs/2601.12879v1
Date: Mon, 19 Jan 2026 09:34:10 GMT
Title: Hierarchical Sparse Circuit Extraction from Billion-Parameter Language Models through Scalable Attribution Graph Decomposition
Authors: Mohammed Mudassir Uddin, Shahnawaz Alam, Mohammed Kaif Pasha,
Abstract summary: We propose a framework to reduce circuit discovery complexity from O(2n) exhaustive enumeration to O(n2 log n)<n>The methodology integrates cross-layer transcoders for monosemantic feature extraction, graph neural network meta-learning for topology prediction, and causal intervention protocols for validation.<n>On modular arithmetic tasks, the framework achieves up to 91% behavioral preservation ($pm$2.3% across runs) while maintaining interpretable subgraph sizes.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mechanistic interpretability seeks to reverse-engineer neural network computations into human-understandable algorithms, yet extracting sparse computational circuits from billion-parameter language models remains challenging due to exponential search complexity and pervasive polysemanticity. The proposed Hierarchical Attribution Graph Decomposition (HAGD) framework reduces circuit discovery complexity from O(2^n) exhaustive enumeration to O(n^2 log n) through multi-resolution abstraction hierarchies and differentiable circuit search. The methodology integrates cross-layer transcoders for monosemantic feature extraction, graph neural network meta-learning for topology prediction, and causal intervention protocols for validation. Empirical evaluation spans GPT-2 variants, Llama-7B through Llama-70B, and Pythia suite models across algorithmic tasks and natural language benchmarks. On modular arithmetic tasks, the framework achieves up to 91% behavioral preservation ($\pm$2.3\% across runs) while maintaining interpretable subgraph sizes. Cross-architecture transfer experiments suggest that discovered circuits exhibit moderate structural similarity (averaging 67%) across model families, indicating potential shared computational patterns. These results provide preliminary foundations for interpretability at larger model scales while identifying significant limitations in current attribution methodologies that require future advances.

Related papers

Integrating Artificial Intelligence and Mixed Integer Linear Programming: Explainable Graph-Based Instance Space Analysis in Air Transportation [0.0]
This paper analyzes the integration of artificial intelligence (AI) with mixed integer linear programming (MILP) to address complex optimization challenges in air transportation with explainability.<n>The study aims to validate the use of Graph Neural Networks (GNNs) for extracting structural feature embeddings from MILP instances.
arXiv Detail & Related papers (2025-12-01T14:03:29Z)
FuncGNN: Learning Functional Semantics of Logic Circuits with Graph Neural Networks [0.0]
And-Inverter Graph synthesiss (AIGs) are widely adopted for representing Boolean logic in modern circuits.<n>We propose FuncGNN, which integrates hybrid feature aggregation to extract multi-granularity topological patterns.<n>FuncGNN achieves improvements of 2.06% and 18.71%, respectively, while reducing training time by approximately 50.6% and GPU memory usage by about 32.8%.
arXiv Detail & Related papers (2025-06-07T13:04:07Z)
A Scalable Quantum Neural Network for Approximate SRBB-Based Unitary Synthesis [1.3108652488669736]
A scalable quantum neural network is introduced as a means to approximate any unitary evolution.<n>The SRBB-based synthesis algorithm is also tested on real hardware.
arXiv Detail & Related papers (2024-12-04T07:21:23Z)
Uncovering Intermediate Variables in Transformers using Circuit Probing [28.81226181942661]
We propose a new analysis technique - circuit probing - that automatically uncovers low-level circuits that compute hypothesized intermediate variables.<n>We apply this method to models trained on simple arithmetic tasks, demonstrating its effectiveness at (1) deciphering the algorithms that models have learned, (2) revealing modular structure within a model, and (3) tracking the development of circuits over training.
arXiv Detail & Related papers (2023-11-07T21:27:17Z)
Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation. Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions. We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z)
A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z)
Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold [3.6052935394000234]
Block model is a canonical random graph model for clustering and community detection on network-structured data. No estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below a certain threshold. We prove that with an arbitrary fraction of the labels feasible throughout the parameter domain.
arXiv Detail & Related papers (2022-05-24T00:03:25Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals. We propose a general graph estimator based on a novel structured fusion regularization. We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z)
Progressive Spatio-Temporal Graph Convolutional Network for Skeleton-Based Human Action Recognition [97.14064057840089]
We propose a method to automatically find a compact and problem-specific network for graph convolutional networks in a progressive manner. Experimental results on two datasets for skeleton-based human action recognition indicate that the proposed method has competitive or even better classification performance.
arXiv Detail & Related papers (2020-11-11T09:57:49Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.