Related papers: Scaling Up Influence Functions

Scaling Up Influence Functions

URL: http://arxiv.org/abs/2112.03052v1
Date: Mon, 6 Dec 2021 13:54:08 GMT
Title: Scaling Up Influence Functions
Authors: Andrea Schioppa, Polina Zablotskaia, David Vilar, Artem Sokolov
Abstract summary: We address efficient calculation of influence functions for tracking predictions back to the training data. We achieve, to the best of our knowledge, the first successful implementation of influence functions that scales to full-size Transformer models.
Score: 6.310723785587086
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We address efficient calculation of influence functions for tracking predictions back to the training data. We propose and analyze a new approach to speeding up the inverse Hessian calculation based on Arnoldi iteration. With this improvement, we achieve, to the best of our knowledge, the first successful implementation of influence functions that scales to full-size (language and vision) Transformer models with several hundreds of millions of parameters. We evaluate our approach on image classification and sequence-to-sequence tasks with tens to a hundred of millions of training examples. Our code will be available at https://github.com/google-research/jax-influence.

Related papers

Enhancing Training Data Attribution with Representational Optimization [57.61977909113113]
Training data attribution methods aim to measure how training data impacts a model's predictions.<n>We propose AirRep, a representation-based approach that closes this gap by learning task-specific and model-aligned representations explicitly for TDA.<n>AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence.
arXiv Detail & Related papers (2025-05-24T05:17:53Z)
Studying Large Language Model Generalization with Influence Functions [29.577692176892135]
Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a sequence were added to the training set? We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to large language models (LLMs) with up to 52 billion parameters. We investigate generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior.
arXiv Detail & Related papers (2023-08-07T04:47:42Z)
ProFormer: Learning Data-efficient Representations of Body Movement with Prototype-based Feature Augmentation and Visual Transformers [31.908276711898548]
Methods for data-efficient recognition from body poses increasingly leverage skeleton sequences structured as image-like arrays. We look at this paradigm from the perspective of transformer networks, for the first time exploring visual transformers as data-efficient encoders of skeleton movement. In our pipeline, body pose sequences cast as image-like representations are converted into patch embeddings and then passed to a visual transformer backbone optimized with deep metric learning.
arXiv Detail & Related papers (2022-02-23T11:11:54Z)
MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning [19.5917119072985]
We model contrastive learning into a binary classification problem to predict if a pair is positive or not. The proposed method outperforms the state-of-the-art algorithms on benchmark datasets like STL-10, CIFAR-10, CIFAR-100.
arXiv Detail & Related papers (2021-11-24T17:51:29Z)
Rectification-based Knowledge Retention for Continual Learning [49.1447478254131]
Deep learning models suffer from catastrophic forgetting when trained in an incremental learning setting. We propose a novel approach to address the task incremental learning problem, which involves training a model on new tasks that arrive in an incremental manner. Our approach can be used in both the zero-shot and non zero-shot task incremental learning settings.
arXiv Detail & Related papers (2021-03-30T18:11:30Z)
FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging [112.19994766375231]
Influence functions approximate the 'influences' of training data-points for test predictions. We present FastIF, a set of simple modifications to influence functions that significantly improves their run-time. Our experiments demonstrate the potential of influence functions in model interpretation and correcting model errors.
arXiv Detail & Related papers (2020-12-31T18:02:34Z)
Few-shot Sequence Learning with Transformers [79.87875859408955]
Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens. We propose an efficient learning algorithm based on Transformers.
arXiv Detail & Related papers (2020-12-17T12:30:38Z)
Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves [53.37905268850274]
We introduce a new, hierarchical, neural network parameterized, hierarchical with access to additional features such as validation loss to enable automatic regularization. Most learneds have been trained on only a single task, or a small number of tasks. We train ours on thousands of tasks, making use of orders of magnitude more compute, resulting in generalizes that perform better to unseen tasks.
arXiv Detail & Related papers (2020-09-23T16:35:09Z)
Multi-Stage Influence Function [97.19210942277354]
We develop a multi-stage influence function score to track predictions from a finetuned model all the way back to the pretraining data. We study two different scenarios with the pretrained embeddings fixed or updated in the finetuning tasks.
arXiv Detail & Related papers (2020-07-17T16:03:11Z)
On the Generalization Effects of Linear Transformations in Data Augmentation [32.01435459892255]
Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks. We study a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting. We propose an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data.
arXiv Detail & Related papers (2020-05-02T04:10:21Z)
Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection. Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling. It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.