Related papers: Multi-Granular Node Pruning for Circuit Discovery

Multi-Granular Node Pruning for Circuit Discovery

URL: http://arxiv.org/abs/2512.10903v1
Date: Thu, 11 Dec 2025 18:32:15 GMT
Title: Multi-Granular Node Pruning for Circuit Discovery
Authors: Muhammad Umair Haider, Hammad Rizwan, Hassan Sajjad, A. B. Siddique,
Abstract summary: We propose a node-level pruning framework for circuit discovery.<n>Our method introduces learnable masks across multiple levels of granularity.<n>Our method has a significantly lower memory footprint, 5-10x, as it does not require keeping intermediate activations in the memory to work.
Score: 5.606576692008564
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Circuit discovery aims to identify minimal subnetworks that are responsible for specific behaviors in large language models (LLMs). Existing approaches primarily rely on iterative edge pruning, which is computationally expensive and limited to coarse-grained units such as attention heads or MLP blocks, overlooking finer structures like individual neurons. We propose a node-level pruning framework for circuit discovery that addresses both scalability and granularity limitations. Our method introduces learnable masks across multiple levels of granularity, from entire blocks to individual neurons, within a unified optimization objective. Granularity-specific sparsity penalties guide the pruning process, allowing a comprehensive compression in a single fine-tuning run. Empirically, our approach identifies circuits that are smaller in nodes than those discovered by prior methods; moreover, we demonstrate that many neurons deemed important by coarse methods are actually irrelevant, while still maintaining task performance. Furthermore, our method has a significantly lower memory footprint, 5-10x, as it does not require keeping intermediate activations in the memory to work.

Related papers

An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning [9.208007322096535]
We develop a new SpFT framework, based on ideas from neural network pruning.<n>We show our method improves SpFT's memory efficiency by 20-50% while matching the accuracy of state-of-the-art methods like LoRA's variants.
arXiv Detail & Related papers (2025-02-17T04:54:42Z)
Scalable iterative pruning of large language and vision models using block coordinate descent [0.31410859223862103]
Pruning neural networks, which involves removing a fraction of their weights, can often maintain high accuracy while significantly reducing model complexity, at least up to a certain limit.<n>We present a neural network pruning technique that builds upon the Combinatorial Brain Surgeon, but solves an optimization problem over a subset of the network weights in an iterative, block-wise manner.
arXiv Detail & Related papers (2024-11-26T17:54:02Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
Zonotope Domains for Lagrangian Neural Network Verification [102.13346781220383]
We decompose the problem of verifying a deep neural network into the verification of many 2-layer neural networks. Our technique yields bounds that improve upon both linear programming and Lagrangian-based verification techniques.
arXiv Detail & Related papers (2022-10-14T19:31:39Z)
Neural Networks Reduction via Lumping [0.0]
A large number of solutions has been published to reduce both the number of operations and the parameters involved with the models. Most of these reducing techniques are actually methods and usually require at least one re-training step to recover the accuracy. We propose a pruning approach that reduces the number of neurons in a network without using any data or fine-tuning, while completely preserving the exact behaviour.
arXiv Detail & Related papers (2022-09-15T17:13:07Z)
Receding Neuron Importances for Structured Pruning [11.375436522599133]
Structured pruning efficiently compresses networks by identifying and removing unimportant neurons. We introduce a simple BatchNorm variation with bounded scaling parameters, based on which we design a novel regularisation term that suppresses only neurons with low importance. We show that neural networks trained this way can be pruned to a larger extent and with less deterioration.
arXiv Detail & Related papers (2022-04-13T14:08:27Z)
Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered. Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal. We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z)
Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction [4.243810214656324]
Memory footprint is one of the main limiting factors for large neural network training. We propose a systematic approach to compute optimal quantization of the retained gradients of the pointwise nonlinear functions. We show that such approximation can be achieved by computing optimal piecewise-constant approximation of the derivative of the activation function.
arXiv Detail & Related papers (2022-02-01T14:51:38Z)
Learning to Detect Critical Nodes in Sparse Graphs via Feature Importance Awareness [53.351863569314794]
The critical node problem (CNP) aims to find a set of critical nodes from a network whose deletion maximally degrades the pairwise connectivity of the residual network. This work proposes a feature importance-aware graph attention network for node representation. It combines it with dueling double deep Q-network to create an end-to-end algorithm to solve CNP for the first time.
arXiv Detail & Related papers (2021-12-03T14:23:05Z)
And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks. We define AND-like neurons and propose measures to increase their proportion in the network. Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z)
ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradient Accumulation [106.04777600352743]
Differentiable architecture search (DARTS) is largely hindered by its substantial memory cost since the entire supernet resides in the memory. The single-path DARTS comes in, which only chooses a single-path submodel at each step. While being memory-friendly, it also comes with low computational costs. We propose a new algorithm called RObustifying Memory-Efficient NAS (ROME) to give a cure.
arXiv Detail & Related papers (2020-11-23T06:34:07Z)
ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks. Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.