Related papers: Studying Large Language Model Generalization with Influence Functions

Studying Large Language Model Generalization with Influence Functions

URL: http://arxiv.org/abs/2308.03296v1
Date: Mon, 7 Aug 2023 04:47:42 GMT
Title: Studying Large Language Model Generalization with Influence Functions
Authors: Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamil\.e Luko\v{s}i\=ut\.e, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman
Abstract summary: Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a sequence were added to the training set? We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to large language models (LLMs) with up to 52 billion parameters. We investigate generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior.
Score: 29.577692176892135
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.

Related papers

Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization [31.379237532476875]
Pre-trained large language models (LLMs) are commonly fine-tuned to adapt to downstream tasks.<n>In this paper, we propose a multi-stage influence function to attribute predictions of fine-tuned LLMs to pre-training data.
arXiv Detail & Related papers (2025-05-08T07:43:44Z)
Detecting Instruction Fine-tuning Attack on Language Models with Influence Function [6.760293300577228]
Instruction fine-tuning attacks undermine model alignment and pose security risks in real-world deployment. We present a simple and effective approach to detect and mitigate such attacks using influence functions. We are the first to apply influence functions for detecting language model instruction fine-tuning attacks on large-scale datasets.
arXiv Detail & Related papers (2025-04-12T00:50:28Z)
Influence Functions for Scalable Data Attribution in Diffusion Models [52.92223039302037]
Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges by developing an textitinfluence functions framework.
arXiv Detail & Related papers (2024-10-17T17:59:02Z)
Do Influence Functions Work on Large Language Models? [10.463762448166714]
Influence functions aim to quantify the impact of individual training data points on a model's predictions. We evaluate influence functions across multiple tasks and find that they consistently perform poorly in most settings.
arXiv Detail & Related papers (2024-09-30T06:50:18Z)
Large-Scale Targeted Cause Discovery with Data-Driven Learning [66.86881771339145]
We propose a novel machine learning approach for inferring causal variables of a target variable from observations. By employing a local-inference strategy, our approach scales with linear complexity in the number of variables, efficiently scaling up to thousands of variables. Empirical results demonstrate superior performance in identifying causal relationships within large-scale gene regulatory networks.
arXiv Detail & Related papers (2024-08-29T02:21:11Z)
Revisit, Extend, and Enhance Hessian-Free Influence Functions [26.105554752277648]
Influence functions serve as crucial tools for assessing sample influence in model interpretation, subset training set selection, and more. In this paper, we revisit a specific, albeit effective approximation method known as Trac. This method substitutes the inverse of the Hessian matrix with an identity matrix.
arXiv Detail & Related papers (2024-05-25T03:43:36Z)
Large Language Models are Biased Reinforcement Learners [0.0]
We show that large language models (LLMs) exhibit behavioral signatures of a relative value bias. Computational cognitive modeling reveals that LLM behavior is well-described by a simple RL algorithm.
arXiv Detail & Related papers (2024-05-19T01:43:52Z)
NPEFF: Non-Negative Per-Example Fisher Factorization [52.44573961263344]
We introduce a novel interpretability method called NPEFF that is readily applicable to any end-to-end differentiable model. We demonstrate that NPEFF has interpretable tunings through experiments on language and vision models.
arXiv Detail & Related papers (2023-10-07T02:02:45Z)
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs. We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting. Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z)
If Influence Functions are the Answer, Then What is the Question? [7.873458431535409]
Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters. While influence estimates align well with leave-one-out retraining for linear models, recent works have shown this alignment is often poor in neural networks.
arXiv Detail & Related papers (2022-09-12T16:17:43Z)
FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging [112.19994766375231]
Influence functions approximate the 'influences' of training data-points for test predictions. We present FastIF, a set of simple modifications to influence functions that significantly improves their run-time. Our experiments demonstrate the potential of influence functions in model interpretation and correcting model errors.
arXiv Detail & Related papers (2020-12-31T18:02:34Z)
Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions. influence estimates are fairly accurate for shallow networks. Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z)
Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions [55.660255727031725]
Influence functions explain the decisions of a model by identifying influential training examples. We conduct a comparison between influence functions and common word-saliency methods on representative tasks. We develop a new measure based on influence functions that can reveal artifacts in training data.
arXiv Detail & Related papers (2020-05-14T00:45:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.