Related papers: ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior

ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior

URL: http://arxiv.org/abs/2505.20076v3
Date: Wed, 01 Oct 2025 04:19:26 GMT
Title: ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior
Authors: Florian Eichin, Yupei Du, Philipp Mondorf, Maria Matveev, Barbara Plank, Michael A. Hedderich,
Abstract summary: Post-hoc interpretability methods typically attribute a model's behavior to its components, data, or training trajectory in isolation.<n>We present ExPLAIND, a unified framework that integrates all these perspectives.
Score: 39.590138981646696
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Post-hoc interpretability methods typically attribute a model's behavior to its components, data, or training trajectory in isolation. This leads to explanations that lack a unified view and may miss key interactions. While combining existing methods or applying them at different training stages offers broader insights, such approaches usually lack theoretical support. In this work, we present ExPLAIND, a unified framework that integrates all these perspectives. First, we generalize recent work on gradient path kernels, which reformulate models trained by gradient descent as a kernel machine, to realistic settings like AdamW. We empirically validate that a CNN and a Transformer are accurately replicated by this reformulation. Second, we derive novel parameter- and step-wise influence scores from the kernel feature maps. Their effectiveness for parameter pruning is comparable to existing methods, demonstrating their value for model component attribution. Finally, jointly interpreting model components and data over the training process, we leverage ExPLAIND to analyze a Transformer that exhibits Grokking. Our findings support previously proposed stages of Grokking, while refining the final phase as one of alignment of input embeddings and final layers around a representation pipeline learned after the memorization phase. Overall, ExPLAIND provides a theoretically grounded, unified framework to interpret model behavior and training dynamics.

Related papers

Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z)
Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z)
Invariance Pair-Guided Learning: Enhancing Robustness in Neural Networks [0.0]
We propose a technique to guide the neural network through the training phase.<n>We form a corrective gradient complementing the traditional gradient descent approach.<n>Experiments on ColoredMNIST, Waterbird-100, and CelebANIST datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2025-02-26T09:36:00Z)
State-space models can learn in-context by gradient descent [1.3087858009942543]
We show that state-space models can perform gradient-based learning and use it for in-context learning in much the same way as transformers.<n>Specifically, we prove that a single structured state-space model layer, augmented with multiplicative input and output gating, can reproduce the outputs of an implicit linear model.<n>We also provide novel insights into the relationship between state-space models and linear self-attention, and their ability to learn in-context.
arXiv Detail & Related papers (2024-10-15T15:22:38Z)
Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters. We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model. Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z)
The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis [27.310894780313618]
This paper undertakes a comprehensive comparison of model capabilities at various pretraining intermediate checkpoints. We confirm that specific downstream metrics exhibit similar training dynamics across models of different sizes. In addition to our core findings, we've reproduced Amber and OpenLLaMA, releasing their intermediate checkpoints.
arXiv Detail & Related papers (2024-04-01T16:00:01Z)
Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning. We consider a setting where the pretraining corpus consists of multitask demonstrations. We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z)
Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems. Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored. We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z)
Analyzing Transformers in Embedding Space [59.434807802802105]
We present a theoretical analysis where all parameters of a trained Transformer are interpreted by projecting them into the embedding space. We show that parameters of both pretrained and fine-tuned models can be interpreted in embedding space. Our findings open the door to interpretation methods that, at least in part, abstract away from model specifics and operate in the embedding space only.
arXiv Detail & Related papers (2022-09-06T14:36:57Z)
Parameter Decoupling Strategy for Semi-supervised 3D Left Atrium Segmentation [0.0]
We present a novel semi-supervised segmentation model based on parameter decoupling strategy to encourage consistent predictions from diverse views. Our method has achieved a competitive result over the state-of-the-art semisupervised methods on the Atrial Challenge dataset.
arXiv Detail & Related papers (2021-09-20T14:51:42Z)
STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data. Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z)
Pair the Dots: Jointly Examining Training History and Test Stimuli for Model Interpretability [44.60486560836836]
Any prediction from a model is made by a combination of learning history and test stimuli. Existing methods to interpret a model's predictions are only able to capture a single aspect of either test stimuli or learning history. We propose an efficient and differentiable approach to make it feasible to interpret a model's prediction by jointly examining training history and test stimuli.
arXiv Detail & Related papers (2020-10-14T10:45:01Z)
Monocular Human Pose and Shape Reconstruction using Part Differentiable Rendering [53.16864661460889]
Recent works succeed in regression-based methods which estimate parametric models directly through a deep neural network supervised by 3D ground truth. In this paper, we introduce body segmentation as critical supervision. To improve the reconstruction with part segmentation, we propose a part-level differentiable part that enables part-based models to be supervised by part segmentation.
arXiv Detail & Related papers (2020-03-24T14:25:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.