Related papers: First is Better Than Last for Language Data Influence

First is Better Than Last for Language Data Influence

URL: http://arxiv.org/abs/2202.11844v3
Date: Thu, 27 Oct 2022 16:22:15 GMT
Title: First is Better Than Last for Language Data Influence
Authors: Chih-Kuan Yeh, Ankur Taly, Mukund Sundararajan, Frederick Liu, Pradeep Ravikumar
Abstract summary: We show that TracIn-WE significantly outperforms other data influence methods applied on the last layer. We also show that TracIn-WE can produce scores not just at the level of the overall training input, but also at the level of words within the training input.
Score: 44.907420330002815
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ability to identify influential training examples enables us to debug training data and explain model behavior. Existing techniques to do so are based on the flow of training data influence through the model parameters. For large models in NLP applications, it is often computationally infeasible to study this flow through all model parameters, therefore techniques usually pick the last layer of weights. However, we observe that since the activation connected to the last layer of weights contains "shared logic", the data influenced calculated via the last layer weights prone to a ``cancellation effect'', where the data influence of different examples have large magnitude that contradicts each other. The cancellation effect lowers the discriminative power of the influence score, and deleting influential examples according to this measure often does not change the model's behavior by much. To mitigate this, we propose a technique called TracIn-WE that modifies a method called TracIn to operate on the word embedding layer instead of the last layer, where the cancellation effect is less severe. One potential concern is that influence based on the word embedding layer may not encode sufficient high level information. However, we find that gradients (unlike embeddings) do not suffer from this, possibly because they chain through higher layers. We show that TracIn-WE significantly outperforms other data influence methods applied on the last layer significantly on the case deletion evaluation on three language classification tasks for different models. In addition, TracIn-WE can produce scores not just at the level of the overall training input, but also at the level of words within the training input, a further aid in debugging.

Related papers

First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation [8.788531432978802]
Training samples influence/impact Large Language Model (LLM) decision-making is essential for effectively interpreting model decisions.<n>Current training sample influence estimation methods (also known as influence functions) undertake this goal by utilizing information flow through the model.<n>However, owing to the large model sizes of today consisting of billions of parameters, these influence computations are often restricted to some subset of model layers.
arXiv Detail & Related papers (2025-11-06T00:47:07Z)
Small-to-Large Generalization: Data Influences Models Consistently Across Scale [76.87199303408161]
We find that small- and large-scale language model predictions (generally) do highly correlate across choice of training data.<n>We also characterize how proxy scale affects effectiveness in two downstream proxy model applications: data attribution and dataset selection.
arXiv Detail & Related papers (2025-05-22T05:50:19Z)
Detecting Instruction Fine-tuning Attack on Language Models with Influence Function [6.760293300577228]
Instruction fine-tuning attacks undermine model alignment and pose security risks in real-world deployment. We present a simple and effective approach to detect and mitigate such attacks using influence functions. We are the first to apply influence functions for detecting language model instruction fine-tuning attacks on large-scale datasets.
arXiv Detail & Related papers (2025-04-12T00:50:28Z)
Scalable Influence and Fact Tracing for Large Language Model Pretraining [14.598556308631018]
Training data attribution (TDA) methods aim to attribute model outputs back to specific training examples. This paper refines existing gradient-based methods to work effectively at scale.
arXiv Detail & Related papers (2024-10-22T20:39:21Z)
Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Unlearning Traces the Influential Training Data of Language Models [31.33791825286853]
This paper presents UnTrac: unlearning traces the influence of a training dataset on the model's performance. We propose a more scalable approach, UnTrac-Inv, which unlearns a test dataset and evaluates the unlearned model on training datasets.
arXiv Detail & Related papers (2024-01-26T23:17:31Z)
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks. We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z)
Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data. We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations. Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z)
Inf-CP: A Reliable Channel Pruning based on Channel Influence [4.692400531340393]
One of the most effective methods of channel pruning is to trim on the basis of the importance of each neuron. Previous works have proposed to trim by considering the statistics of a single layer or a plurality of successive layers of neurons. We propose to use ensemble learning to train a model for different batches of data.
arXiv Detail & Related papers (2021-12-05T09:30:43Z)
Hard-label Manifolds: Unexpected Advantages of Query Efficiency for Finding On-manifold Adversarial Examples [67.23103682776049]
Recent zeroth order hard-label attacks on image classification models have shown comparable performance to their first-order, gradient-level alternatives. It was recently shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors. We propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary's gradient estimate.
arXiv Detail & Related papers (2021-03-04T20:53:06Z)
FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging [112.19994766375231]
Influence functions approximate the 'influences' of training data-points for test predictions. We present FastIF, a set of simple modifications to influence functions that significantly improves their run-time. Our experiments demonstrate the potential of influence functions in model interpretation and correcting model errors.
arXiv Detail & Related papers (2020-12-31T18:02:34Z)
Explaining Neural Matrix Factorization with Gradient Rollback [22.33402175974514]
gradient rollback is a general approach for influence estimation. We show that gradient rollback is highly efficient at both training and test time. gradient rollback provides faithful explanations for knowledge base completion and recommender datasets.
arXiv Detail & Related papers (2020-10-12T08:15:54Z)
Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning [135.89676456312247]
We show how to use a different weight for every unlabeled example. We adjust those weights via an algorithm based on the influence function. We demonstrate that this technique outperforms state-of-the-art methods on semi-supervised image and language classification tasks.
arXiv Detail & Related papers (2020-07-02T17:59:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.