Simfluence: Modeling the Influence of Individual Training Examples by
Simulating Training Runs
- URL: http://arxiv.org/abs/2303.08114v1
- Date: Tue, 14 Mar 2023 17:47:25 GMT
- Title: Simfluence: Modeling the Influence of Individual Training Examples by
Simulating Training Runs
- Authors: Kelvin Guu, Albert Webson, Ellie Pavlick, Lucas Dixon, Ian Tenney,
Tolga Bolukbasi
- Abstract summary: Training data attribution (TDA) methods trace a model's prediction on any given example back to specific influential training examples.
We propose Simfluence, a new paradigm for TDA where the goal is not to produce a single influence score per example, but instead a training run simulator.
Simfluence captures non-additive interactions and is often able to predict the spiky trajectory of individual example losses with surprising fidelity.
- Score: 27.314239745883967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training data attribution (TDA) methods offer to trace a model's prediction
on any given example back to specific influential training examples. Existing
approaches do so by assigning a scalar influence score to each training
example, under a simplifying assumption that influence is additive. But in
reality, we observe that training examples interact in highly non-additive ways
due to factors such as inter-example redundancy, training order, and curriculum
learning effects.
To study such interactions, we propose Simfluence, a new paradigm for TDA
where the goal is not to produce a single influence score per example, but
instead a training run simulator: the user asks, ``If my model had trained on
example $z_1$, then $z_2$, ..., then $z_n$, how would it behave on
$z_{test}$?''; the simulator should then output a simulated training run, which
is a time series predicting the loss on $z_{test}$ at every step of the
simulated run. This enables users to answer counterfactual questions about what
their model would have learned under different training curricula, and to
directly see where in training that learning would occur.
We present a simulator, Simfluence-Linear, that captures non-additive
interactions and is often able to predict the spiky trajectory of individual
example losses with surprising fidelity. Furthermore, we show that existing TDA
methods such as TracIn and influence functions can be viewed as special cases
of Simfluence-Linear. This enables us to directly compare methods in terms of
their simulation accuracy, subsuming several prior TDA approaches to
evaluation. In experiments on large language model (LLM) fine-tuning, we show
that our method predicts loss trajectories with much higher accuracy than
existing TDA methods (doubling Spearman's correlation and reducing mean-squared
error by 75%) across several tasks, models, and training methods.
Related papers
- What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes [30.30769701138665]
We introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data.
Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem.
We introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point.
arXiv Detail & Related papers (2024-02-14T03:43:05Z) - Unlearning Traces the Influential Training Data of Language Models [31.33791825286853]
This paper presents UnTrac: unlearning traces the influence of a training dataset on the model's performance.
We propose a more scalable approach, UnTrac-Inv, which unlearns a test dataset and evaluates the unlearned model on training datasets.
arXiv Detail & Related papers (2024-01-26T23:17:31Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples.
We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z) - Pair the Dots: Jointly Examining Training History and Test Stimuli for
Model Interpretability [44.60486560836836]
Any prediction from a model is made by a combination of learning history and test stimuli.
Existing methods to interpret a model's predictions are only able to capture a single aspect of either test stimuli or learning history.
We propose an efficient and differentiable approach to make it feasible to interpret a model's prediction by jointly examining training history and test stimuli.
arXiv Detail & Related papers (2020-10-14T10:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.