On Influence Functions, Classification Influence, Relative Influence,
Memorization and Generalization
- URL: http://arxiv.org/abs/2305.16094v1
- Date: Thu, 25 May 2023 14:26:36 GMT
- Title: On Influence Functions, Classification Influence, Relative Influence,
Memorization and Generalization
- Authors: Michael Kounavis, Ousmane Dia, Ilqar Ramazanli
- Abstract summary: We study influence functions from the perspective of simplifying the computations they involve.
We demonstrate that the sign of the influence value can indicate whether a training point is to memorize.
We conclude that influence functions can be made practical, even for large scale machine learning systems.
- Score: 0.4297070083645048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning systems such as large scale recommendation systems or
natural language processing systems are usually trained on billions of training
points and are associated with hundreds of billions or trillions of parameters.
Improving the learning process in such a way that both the training load is
reduced and the model accuracy improved is highly desired. In this paper we
take a first step toward solving this problem, studying influence functions
from the perspective of simplifying the computations they involve. We discuss
assumptions, under which influence computations can be performed on
significantly fewer parameters. We also demonstrate that the sign of the
influence value can indicate whether a training point is to memorize, as
opposed to generalize upon. For this purpose we formally define what
memorization means for a training point, as opposed to generalization. We
conclude that influence functions can be made practical, even for large scale
machine learning systems, and that influence values can be taken into account
by algorithms that selectively remove training points, as part of the learning
process.
Related papers
- Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z) - Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training.
We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO.
As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z) - What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Scalability of memorization-based machine unlearning [2.5782420501870296]
Machine unlearning (MUL) focuses on removing the influence of specific subsets of data from pretrained models.
Memorization-based unlearning methods have been developed, demonstrating exceptional performance with respect to unlearning quality.
We tackle these scalability challenges of state-of-the-art memorization-based MUL algorithms using a series of memorization-score proxies.
arXiv Detail & Related papers (2024-10-21T21:18:39Z) - Empirical influence functions to understand the logic of fine-tuning [1.9116784879310031]
We use empirical influence measured using fine-tuning to demonstrate how individual training samples affect outputs.
We show that these desiderata are violated for both for simple convolutional networks and for a modern LLM.
Our results suggest that popular models cannot generalize or perform logic in the way they appear.
arXiv Detail & Related papers (2024-06-01T17:31:06Z) - Studying Large Language Model Generalization with Influence Functions [29.577692176892135]
Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a sequence were added to the training set?
We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to large language models (LLMs) with up to 52 billion parameters.
We investigate generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior.
arXiv Detail & Related papers (2023-08-07T04:47:42Z) - A Bayesian Approach to (Online) Transfer Learning: Theory and Algorithms [6.193838300896449]
We study transfer learning from a Bayesian perspective, where a parametric statistical model is used.
Specifically, we study three variants of transfer learning problems, instantaneous, online, and time-variant transfer learning.
For each problem, we define an appropriate objective function, and provide either exact expressions or upper bounds on the learning performance.
Examples show that the derived bounds are accurate even for small sample sizes.
arXiv Detail & Related papers (2021-09-03T08:43:29Z) - Learning by Ignoring, with Application to Domain Adaptation [10.426533624387305]
We propose a novel machine learning framework referred to as learning by ignoring (LBI)
Our framework automatically identifies pretraining data examples that have large domain shift from the target distribution by learning an ignoring variable for each example and excludes them from the pretraining process.
A gradient-based algorithm is developed to efficiently solve the three-level optimization problem in LBI.
arXiv Detail & Related papers (2020-12-28T15:33:41Z) - Efficient Estimation of Influence of a Training Instance [56.29080605123304]
We propose an efficient method for estimating the influence of a training instance on a neural network model.
Our method is inspired by dropout, which zero-masks a sub-network and prevents the sub-network from learning each training instance.
We demonstrate that the proposed method can capture training influences, enhance the interpretability of error predictions, and cleanse the training dataset for improving generalization.
arXiv Detail & Related papers (2020-12-08T04:31:38Z) - Loss Bounds for Approximate Influence-Based Abstraction [81.13024471616417]
Influence-based abstraction aims to gain leverage by modeling local subproblems together with the 'influence' that the rest of the system exerts on them.
This paper investigates the performance of such approaches from a theoretical perspective.
We show that neural networks trained with cross entropy are well suited to learn approximate influence representations.
arXiv Detail & Related papers (2020-11-03T15:33:10Z) - Multi-Stage Influence Function [97.19210942277354]
We develop a multi-stage influence function score to track predictions from a finetuned model all the way back to the pretraining data.
We study two different scenarios with the pretrained embeddings fixed or updated in the finetuning tasks.
arXiv Detail & Related papers (2020-07-17T16:03:11Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z) - Explaining Black Box Predictions and Unveiling Data Artifacts through
Influence Functions [55.660255727031725]
Influence functions explain the decisions of a model by identifying influential training examples.
We conduct a comparison between influence functions and common word-saliency methods on representative tasks.
We develop a new measure based on influence functions that can reveal artifacts in training data.
arXiv Detail & Related papers (2020-05-14T00:45:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.