Empirical influence functions to understand the logic of fine-tuning
- URL: http://arxiv.org/abs/2406.00509v1
- Date: Sat, 1 Jun 2024 17:31:06 GMT
- Title: Empirical influence functions to understand the logic of fine-tuning
- Authors: Jordan K. Matelsky, Lyle Ungar, Konrad P. Kording,
- Abstract summary: We use empirical influence measured using fine-tuning to demonstrate how individual training samples affect outputs.
We show that these desiderata are violated for both for simple convolutional networks and for a modern LLM.
Our results suggest that popular models cannot generalize or perform logic in the way they appear.
- Score: 1.9116784879310031
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the process of learning in neural networks is crucial for improving their performance and interpreting their behavior. This can be approximately understood by asking how a model's output is influenced when we fine-tune on a new training sample. There are desiderata for such influences, such as decreasing influence with semantic distance, sparseness, noise invariance, transitive causality, and logical consistency. Here we use the empirical influence measured using fine-tuning to demonstrate how individual training samples affect outputs. We show that these desiderata are violated for both for simple convolutional networks and for a modern LLM. We also illustrate how prompting can partially rescue this failure. Our paper presents an efficient and practical way of quantifying how well neural networks learn from fine-tuning stimuli. Our results suggest that popular models cannot generalize or perform logic in the way they appear to.
Related papers
- How Do Training Methods Influence the Utilization of Vision Models? [23.41975772383921]
Not all learnable parameters contribute equally to a neural network's decision function.
We revisit earlier studies that examined how architecture and task complexity influence this phenomenon.
Our findings reveal that the training method strongly influences which layers become critical to the decision function for a given task.
arXiv Detail & Related papers (2024-10-18T13:54:46Z) - Benchmark data to study the influence of pre-training on explanation
performance in MR image classification [0.6927055673104934]
CNNs are frequently and successfully used in medical prediction tasks.
They are often used in combination with transfer learning, leading to improved performance when training data for the task are scarce.
Previous studies have rarely quantitatively evaluated the 'explanation performance' of XAI methods against ground-truth data.
arXiv Detail & Related papers (2023-06-21T09:53:37Z) - Adversarial Attacks on the Interpretation of Neuron Activation
Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models.
In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Contrastive Reasoning in Neural Networks [26.65337569468343]
Inference built on features that identify causal class dependencies is termed as feed-forward inference.
In this paper, we formalize the structure of contrastive reasoning and propose a methodology to extract a neural network's notion of contrast.
We demonstrate the value of contrastively recognizing images under distortions by reporting an improvement of 3.47%, 2.56%, and 5.48% in average accuracy.
arXiv Detail & Related papers (2021-03-23T05:54:36Z) - Recoding latent sentence representations -- Dynamic gradient-based
activation modification in RNNs [0.0]
In RNNs, encoding information in a suboptimal way can impact the quality of representations based on later elements in the sequence.
I propose an augmentation to standard RNNs in form of a gradient-based correction mechanism.
I conduct different experiments in the context of language modeling, where the impact of using such a mechanism is examined in detail.
arXiv Detail & Related papers (2021-01-03T17:54:17Z) - Efficient Estimation of Influence of a Training Instance [56.29080605123304]
We propose an efficient method for estimating the influence of a training instance on a neural network model.
Our method is inspired by dropout, which zero-masks a sub-network and prevents the sub-network from learning each training instance.
We demonstrate that the proposed method can capture training influences, enhance the interpretability of error predictions, and cleanse the training dataset for improving generalization.
arXiv Detail & Related papers (2020-12-08T04:31:38Z) - Loss Bounds for Approximate Influence-Based Abstraction [81.13024471616417]
Influence-based abstraction aims to gain leverage by modeling local subproblems together with the 'influence' that the rest of the system exerts on them.
This paper investigates the performance of such approaches from a theoretical perspective.
We show that neural networks trained with cross entropy are well suited to learn approximate influence representations.
arXiv Detail & Related papers (2020-11-03T15:33:10Z) - Explaining Black Box Predictions and Unveiling Data Artifacts through
Influence Functions [55.660255727031725]
Influence functions explain the decisions of a model by identifying influential training examples.
We conduct a comparison between influence functions and common word-saliency methods on representative tasks.
We develop a new measure based on influence functions that can reveal artifacts in training data.
arXiv Detail & Related papers (2020-05-14T00:45:23Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.