Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity
Tracking
- URL: http://arxiv.org/abs/2402.14811v1
- Date: Thu, 22 Feb 2024 18:59:24 GMT
- Title: Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity
Tracking
- Authors: Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David
Bau
- Abstract summary: We study how fine-tuning affects the internal mechanisms implemented in language models.
Fine-tuning enhances, rather than alters, the mechanistic operation of the model.
- Score: 53.66999416757543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning on generalized tasks such as instruction following, code
generation, and mathematics has been shown to enhance language models'
performance on a range of tasks. Nevertheless, explanations of how such
fine-tuning influences the internal computations in these models remain
elusive. We study how fine-tuning affects the internal mechanisms implemented
in language models. As a case study, we explore the property of entity
tracking, a crucial facet of language comprehension, where models fine-tuned on
mathematics have substantial performance gains. We identify the mechanism that
enables entity tracking and show that (i) in both the original model and its
fine-tuned versions primarily the same circuit implements entity tracking. In
fact, the entity tracking circuit of the original model on the fine-tuned
versions performs better than the full original model. (ii) The circuits of all
the models implement roughly the same functionality: Entity tracking is
performed by tracking the position of the correct entity in both the original
model and its fine-tuned versions. (iii) Performance boost in the fine-tuned
models is primarily attributed to its improved ability to handle the augmented
positional information. To uncover these findings, we employ: Patch Patching,
DCM, which automatically detects model components responsible for specific
semantics, and CMAP, a new approach for patching activations across models to
reveal improved mechanisms. Our findings suggest that fine-tuning enhances,
rather than fundamentally alters, the mechanistic operation of the model.
Related papers
- Chain and Causal Attention for Efficient Entity Tracking [46.577761606415805]
We propose an efficient and frugal enhancement to the standard attention mechanism.
By considering attention as an adjacency matrix, our model can track entity states with a single layer.
Our contributions include theoretical insights, an improved attention mechanism, and empirical validation.
arXiv Detail & Related papers (2024-10-07T23:54:10Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains.
We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z) - Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 100 publically available models.
We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models.
We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z) - Beyond Self-learned Attention: Mitigating Attention Bias in
Transformer-based Models Using Attention Guidance [9.486558126032639]
We introduce SyntaGuid, a novel approach to guide Transformer-based models towards critical source code tokens.
We show that SyntaGuid can improve overall performance up to 3.25% and fix up to 28.3% wrong predictions.
arXiv Detail & Related papers (2024-02-26T18:03:50Z) - Has Your Pretrained Model Improved? A Multi-head Posterior Based
Approach [25.927323251675386]
We leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models.
We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models.
Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models.
arXiv Detail & Related papers (2024-01-02T17:08:26Z) - Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks [37.278707106871295]
We study how fine-tuning alters the underlying capabilities learned by a model during pretraining.
We show that fine-tuning rarely alters the underlying model capabilities.
We also show that fine-tuning can unintentionally remove a model's safety wrapper.
arXiv Detail & Related papers (2023-11-21T18:51:04Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Scaling Local Self-Attention For Parameter Efficient Visual Backbones [29.396052798583234]
Self-attention has the promise of improving computer vision systems due to parameter-independent scaling of receptive fields and content-dependent interactions.
We develop a new self-attention model family, emphHaloNets, which reach state-of-the-art accuracies on the parameter-limited setting of the ImageNet classification benchmark.
arXiv Detail & Related papers (2021-03-23T17:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.