Related papers: Accountability Attribution: Tracing Model Behavior to Training Processes

Accountability Attribution: Tracing Model Behavior to Training Processes

URL: http://arxiv.org/abs/2506.00175v1
Date: Fri, 30 May 2025 19:27:39 GMT
Title: Accountability Attribution: Tracing Model Behavior to Training Processes
Authors: Shichang Zhang, Hongzhe Du, Karim Saraipour, Jiaqi W. Ma, Himabindu Lakkaraju,
Abstract summary: AI development pipelines often involve multiple stages-pretraining, fine-tuning rounds, and subsequent adaptation or alignment-with numerous model update steps within each stage.<n>This raises a critical question of accountability: when a deployed model succeeds or fails, which stage is responsible, and to what extent?<n>We pose the problem of accountability attribution, which aims to trace model behavior back to specific stages of the training process.
Score: 20.261750156630463
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern AI development pipelines often involve multiple stages-pretraining, fine-tuning rounds, and subsequent adaptation or alignment-with numerous model update steps within each stage. This raises a critical question of accountability: when a deployed model succeeds or fails, which stage is responsible, and to what extent? We pose the problem of accountability attribution, which aims to trace model behavior back to specific stages of the training process. To address this, we propose a general framework that answers counterfactual questions about stage effects: how would the model behavior have changed if the updates from a training stage had not been executed?. Within this framework, we introduce estimators based on first-order approximations that efficiently quantify the stage effects without retraining. Our estimators account for both the training data and key aspects of optimization dynamics, including learning rate schedules, momentum, and weight decay. Empirically, we demonstrate that our approach identifies training stages accountable for specific behaviors, offering a practical tool for model analysis and a step toward more accountable AI development.

Related papers

EvoLM: In Search of Lost Language Model Training Dynamics [97.69616550374579]
EvoLM is a model suite that enables systematic and transparent analysis of LMs' training dynamics across pre-training, continued pre-training, supervised fine-tuning, and reinforcement learning.<n>By training over 100 LMs with 1B and 4B parameters from scratch, we rigorously evaluate both upstream (language modeling) and downstream (problem-solving) reasoning capabilities.
arXiv Detail & Related papers (2025-06-19T04:58:47Z)
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy. By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z)
Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)<n>After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.<n>We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
Model-Based Transfer Learning for Contextual Reinforcement Learning [5.5597941107270215]
We introduce Model-Based Transfer Learning to solve contextual RL problems.<n>We show theoretically that the method exhibits sublinear regret in the number of training tasks.<n>We experimentally validate our methods using urban traffic and standard continuous control benchmarks.
arXiv Detail & Related papers (2024-08-08T14:46:01Z)
Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages. Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z)
The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis [27.310894780313618]
This paper undertakes a comprehensive comparison of model capabilities at various pretraining intermediate checkpoints. We confirm that specific downstream metrics exhibit similar training dynamics across models of different sizes. In addition to our core findings, we've reproduced Amber and OpenLLaMA, releasing their intermediate checkpoints.
arXiv Detail & Related papers (2024-04-01T16:00:01Z)
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks [37.278707106871295]
We study how fine-tuning alters the underlying capabilities learned by a model during pretraining. We show that fine-tuning rarely alters the underlying model capabilities. We also show that fine-tuning can unintentionally remove a model's safety wrapper.
arXiv Detail & Related papers (2023-11-21T18:51:04Z)
Learning to Modulate pre-trained Models in RL [22.812215561012874]
Fine-tuning a pre-trained model often suffers from catastrophic forgetting. Our study shows that with most fine-tuning approaches, the performance on pre-training tasks deteriorates significantly. We propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation of learned skills by modulating the information flow of the frozen pre-trained model.
arXiv Detail & Related papers (2023-06-26T17:53:05Z)
Architecture, Dataset and Model-Scale Agnostic Data-free Meta-Learning [117.48444197402858]
We propose ePisode cUrriculum inveRsion (ECI) during data-free meta training and invErsion calibRation following inner loop (ICFIL) during meta testing.<n>ECI adaptively increases the difficulty level of pseudo episodes according to the real-time feedback of the meta model.<n>We formulate the optimization process of meta training with ECI as an adversarial form in an end-to-end manner.
arXiv Detail & Related papers (2023-03-20T15:10:41Z)
Training Dynamics for Text Summarization Models [45.62439188988816]
We analyze the training dynamics for generation models, focusing on news summarization. Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, we study what the model learns at different stages of its fine-tuning process. We find that properties such as copy behavior are learnt earlier in the training process and these observations are robust across domains. On the other hand, factual errors, such as hallucination of unsupported facts, are learnt in the later stages, and this behavior is more varied across domains.
arXiv Detail & Related papers (2021-10-15T21:13:41Z)
Multi-Stage Influence Function [97.19210942277354]
We develop a multi-stage influence function score to track predictions from a finetuned model all the way back to the pretraining data. We study two different scenarios with the pretrained embeddings fixed or updated in the finetuning tasks.
arXiv Detail & Related papers (2020-07-17T16:03:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.