APEX: Probing Neural Networks via Activation Perturbation
- URL: http://arxiv.org/abs/2602.03586v1
- Date: Tue, 03 Feb 2026 14:36:36 GMT
- Title: APEX: Probing Neural Networks via Activation Perturbation
- Authors: Tao Ren, Xiaoyu Luo, Qiongxiu Li,
- Abstract summary: We introduce Activation Perturbation for EXploration (APEX) as an inference-time probing paradigm for neural networks.<n>APEX perturbs hidden activations while keeping both inputs and model parameters fixed.<n>Our results show that APEX offers an effective perspective for exploring, and understanding neural networks beyond what is accessible from input space alone.
- Score: 10.517751599566548
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prior work on probing neural networks primarily relies on input-space analysis or parameter perturbation, both of which face fundamental limitations in accessing structural information encoded in intermediate representations. We introduce Activation Perturbation for EXploration (APEX), an inference-time probing paradigm that perturbs hidden activations while keeping both inputs and model parameters fixed. We theoretically show that activation perturbation induces a principled transition from sample-dependent to model-dependent behavior by suppressing input-specific signals and amplifying representation-level structure, and further establish that input perturbation corresponds to a constrained special case of this framework. Through representative case studies, we demonstrate the practical advantages of APEX. In the small-noise regime, APEX provides a lightweight and efficient measure of sample regularity that aligns with established metrics, while also distinguishing structured from randomly labeled models and revealing semantically coherent prediction transitions. In the large-noise regime, APEX exposes training-induced model-level biases, including a pronounced concentration of predictions on the target class in backdoored models. Overall, our results show that APEX offers an effective perspective for exploring, and understanding neural networks beyond what is accessible from input space alone.
Related papers
- Learning a Generative Meta-Model of LLM Activations [75.30161960337892]
We create "meta-models" that learn the distribution of a network's internal states.<n>Applying the meta-model's learned prior to steering interventions improves fluency, with larger gains as loss decreases.<n>These results suggest generative meta-models offer a scalable path toward interpretability without restrictive structural assumptions.
arXiv Detail & Related papers (2026-02-06T18:59:56Z) - Deep Neural Networks as Iterated Function Systems and a Generalization Bound [2.7920304852537536]
We show that two important deep architectures can be viewed as, or canonically associated with, place-dependent IFS.<n>We derive a Wasserstein bound for generative modeling that controls the collage-type approximation error between the data distribution and its image.
arXiv Detail & Related papers (2026-01-27T07:32:49Z) - Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic Segmentation [48.806000388608005]
We propose X-Agent, an innovative OVSS framework employing latent semantic-aware agent'' to orchestrate cross-modal attention mechanisms.<n>X-Agent achieves state-of-the-art performance while effectively enhancing the latent semantic saliency.
arXiv Detail & Related papers (2025-09-01T09:01:58Z) - Neural Bridge Processes [21.702709965353804]
We propose a novel method for modeling functions where inputs x act as dynamic anchors for the entire diffusion trajectory.<n>We validate NBPs on synthetic data, EEG signal regression and image regression tasks, achieving substantial improvements over baselines.
arXiv Detail & Related papers (2025-08-10T07:44:52Z) - A Simple Approximate Bayesian Inference Neural Surrogate for Stochastic Petri Net Models [0.0]
We introduce a neural-network-based approximation of the posterior distribution framework.<n>Our model employs a lightweight 1D Convolutional Residual Network trained end-to-end on Gillespie-simulated SPN realizations.<n>On synthetic SPNs with 20% missing events, our surrogate recovers rate-function coefficients with an RMSE = 0.108 and substantially runs faster than traditional Bayesian approaches.
arXiv Detail & Related papers (2025-07-14T18:31:19Z) - Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [57.19302613163439]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z) - CASTLE: Regularization via Auxiliary Causal Graph Discovery [89.74800176981842]
We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables.
CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
arXiv Detail & Related papers (2020-09-28T09:49:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.