On Influence Functions, Classification Influence, Relative Influence,
Memorization and Generalization
- URL: http://arxiv.org/abs/2305.16094v1
- Date: Thu, 25 May 2023 14:26:36 GMT
- Title: On Influence Functions, Classification Influence, Relative Influence,
Memorization and Generalization
- Authors: Michael Kounavis, Ousmane Dia, Ilqar Ramazanli
- Abstract summary: We study influence functions from the perspective of simplifying the computations they involve.
We demonstrate that the sign of the influence value can indicate whether a training point is to memorize.
We conclude that influence functions can be made practical, even for large scale machine learning systems.
- Score: 0.4297070083645048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning systems such as large scale recommendation systems or
natural language processing systems are usually trained on billions of training
points and are associated with hundreds of billions or trillions of parameters.
Improving the learning process in such a way that both the training load is
reduced and the model accuracy improved is highly desired. In this paper we
take a first step toward solving this problem, studying influence functions
from the perspective of simplifying the computations they involve. We discuss
assumptions, under which influence computations can be performed on
significantly fewer parameters. We also demonstrate that the sign of the
influence value can indicate whether a training point is to memorize, as
opposed to generalize upon. For this purpose we formally define what
memorization means for a training point, as opposed to generalization. We
conclude that influence functions can be made practical, even for large scale
machine learning systems, and that influence values can be taken into account
by algorithms that selectively remove training points, as part of the learning
process.
Related papers
- Empirical influence functions to understand the logic of fine-tuning [1.9116784879310031]
We use empirical influence measured using fine-tuning to demonstrate how individual training samples affect outputs.
We show that these desiderata are violated for both for simple convolutional networks and for a modern LLM.
Our results suggest that popular models cannot generalize or perform logic in the way they appear.
arXiv Detail & Related papers (2024-06-01T17:31:06Z) - Studying Large Language Model Generalization with Influence Functions [29.577692176892135]
Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a sequence were added to the training set?
We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to large language models (LLMs) with up to 52 billion parameters.
We investigate generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior.
arXiv Detail & Related papers (2023-08-07T04:47:42Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Scaling Up Influence Functions [6.310723785587086]
We address efficient calculation of influence functions for tracking predictions back to the training data.
We achieve, to the best of our knowledge, the first successful implementation of influence functions that scales to full-size Transformer models.
arXiv Detail & Related papers (2021-12-06T13:54:08Z) - A Bayesian Approach to (Online) Transfer Learning: Theory and Algorithms [6.193838300896449]
We study transfer learning from a Bayesian perspective, where a parametric statistical model is used.
Specifically, we study three variants of transfer learning problems, instantaneous, online, and time-variant transfer learning.
For each problem, we define an appropriate objective function, and provide either exact expressions or upper bounds on the learning performance.
Examples show that the derived bounds are accurate even for small sample sizes.
arXiv Detail & Related papers (2021-09-03T08:43:29Z) - Learning by Ignoring, with Application to Domain Adaptation [10.426533624387305]
We propose a novel machine learning framework referred to as learning by ignoring (LBI)
Our framework automatically identifies pretraining data examples that have large domain shift from the target distribution by learning an ignoring variable for each example and excludes them from the pretraining process.
A gradient-based algorithm is developed to efficiently solve the three-level optimization problem in LBI.
arXiv Detail & Related papers (2020-12-28T15:33:41Z) - Efficient Estimation of Influence of a Training Instance [56.29080605123304]
We propose an efficient method for estimating the influence of a training instance on a neural network model.
Our method is inspired by dropout, which zero-masks a sub-network and prevents the sub-network from learning each training instance.
We demonstrate that the proposed method can capture training influences, enhance the interpretability of error predictions, and cleanse the training dataset for improving generalization.
arXiv Detail & Related papers (2020-12-08T04:31:38Z) - Loss Bounds for Approximate Influence-Based Abstraction [81.13024471616417]
Influence-based abstraction aims to gain leverage by modeling local subproblems together with the 'influence' that the rest of the system exerts on them.
This paper investigates the performance of such approaches from a theoretical perspective.
We show that neural networks trained with cross entropy are well suited to learn approximate influence representations.
arXiv Detail & Related papers (2020-11-03T15:33:10Z) - Multi-Stage Influence Function [97.19210942277354]
We develop a multi-stage influence function score to track predictions from a finetuned model all the way back to the pretraining data.
We study two different scenarios with the pretrained embeddings fixed or updated in the finetuning tasks.
arXiv Detail & Related papers (2020-07-17T16:03:11Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z) - Explaining Black Box Predictions and Unveiling Data Artifacts through
Influence Functions [55.660255727031725]
Influence functions explain the decisions of a model by identifying influential training examples.
We conduct a comparison between influence functions and common word-saliency methods on representative tasks.
We develop a new measure based on influence functions that can reveal artifacts in training data.
arXiv Detail & Related papers (2020-05-14T00:45:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.