Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification
- URL: http://arxiv.org/abs/2602.24266v1
- Date: Fri, 27 Feb 2026 18:35:10 GMT
- Title: Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification
- Authors: Amir Asiaee,
- Abstract summary: Neural networks are hypothesized to implement interpretable causal mechanisms.<n> verifying this requires finding a causal abstraction -- a simpler, high-level Structural Causal Model (SCM) faithful to the network under interventions.<n>We reframe the problem by viewing structured pruning as a search over approximate abstractions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks are hypothesized to implement interpretable causal mechanisms, yet verifying this requires finding a causal abstraction -- a simpler, high-level Structural Causal Model (SCM) faithful to the network under interventions. Discovering such abstractions is hard: it typically demands brute-force interchange interventions or retraining. We reframe the problem by viewing structured pruning as a search over approximate abstractions. Treating a trained network as a deterministic SCM, we derive an Interventional Risk objective whose second-order expansion yields closed-form criteria for replacing units with constants or folding them into neighbors. Under uniform curvature, our score reduces to activation variance, recovering variance-based pruning as a special case while clarifying when it fails. The resulting procedure efficiently extracts sparse, intervention-faithful abstractions from pretrained networks, which we validate via interchange interventions.
Related papers
- Learning a Generative Meta-Model of LLM Activations [75.30161960337892]
We create "meta-models" that learn the distribution of a network's internal states.<n>Applying the meta-model's learned prior to steering interventions improves fluency, with larger gains as loss decreases.<n>These results suggest generative meta-models offer a scalable path toward interpretability without restrictive structural assumptions.
arXiv Detail & Related papers (2026-02-06T18:59:56Z) - APEX: Probing Neural Networks via Activation Perturbation [10.517751599566548]
We introduce Activation Perturbation for EXploration (APEX) as an inference-time probing paradigm for neural networks.<n>APEX perturbs hidden activations while keeping both inputs and model parameters fixed.<n>Our results show that APEX offers an effective perspective for exploring, and understanding neural networks beyond what is accessible from input space alone.
arXiv Detail & Related papers (2026-02-03T14:36:36Z) - Learning Consistent Causal Abstraction Networks [14.952578725545344]
Causal artificial intelligence aims to enhance explainability, robustness, and trustworthiness in AI by leveraging structural causal models (SCMs)<n>We tackle the consistent abstraction network (CAN)<n>Experiments show competitive learning on synthetic data, and successful recovery of diverse CAN structures.
arXiv Detail & Related papers (2026-02-02T16:16:29Z) - Greedy Is Enough: Sparse Action Discovery in Agentic LLMs [11.62669179647184]
empirical evidence suggests that only a small subset of actions meaningfully influences performance in a given deployment.<n>Motivated by this observation, we study a contextual linear reward model in which action is governed by a structured sparsity assumption.<n>Our results identify sparse action discovery as a fundamental principle underlying large-action decision-making.
arXiv Detail & Related papers (2026-01-13T07:15:32Z) - On Evolution-Based Models for Experimentation Under Interference [7.262048441360133]
We study an evolution-based approach that investigates how outcomes change across observation rounds in response to interventions.<n>We highlight causal message passing as an instantiation of this method in dense networks.<n>We discuss the limits of this approach, showing that strong temporal trends or endogenous interference can undermine identification.
arXiv Detail & Related papers (2025-11-26T18:53:46Z) - Semantic Loss Functions for Neuro-Symbolic Structured Prediction [74.18322585177832]
We discuss the semantic loss, which injects knowledge about such structure, defined symbolically, into training.
It is agnostic to the arrangement of the symbols, and depends only on the semantics expressed thereby.
It can be combined with both discriminative and generative neural models.
arXiv Detail & Related papers (2024-05-12T22:18:25Z) - Finding Alignments Between Interpretable Causal Variables and
Distributed Neural Representations [62.65877150123775]
Causal abstraction is a promising theoretical framework for explainable artificial intelligence.
Existing causal abstraction methods require a brute-force search over alignments between the high-level model and the low-level one.
We present distributed alignment search (DAS), which overcomes these limitations.
arXiv Detail & Related papers (2023-03-05T00:57:49Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z) - DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator
Search [55.164053971213576]
convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead.
Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure.
Existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space.
arXiv Detail & Related papers (2020-11-04T07:43:01Z) - Structural Causal Models Are (Solvable by) Credal Networks [70.45873402967297]
Causal inferences can be obtained by standard algorithms for the updating of credal nets.
This contribution should be regarded as a systematic approach to represent structural causal models by credal networks.
Experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.
arXiv Detail & Related papers (2020-08-02T11:19:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.