Related papers: Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

URL: http://arxiv.org/abs/2402.14811v1
Date: Thu, 22 Feb 2024 18:59:24 GMT
Title: Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Authors: Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David Bau
Abstract summary: We study how fine-tuning affects the internal mechanisms implemented in language models. Fine-tuning enhances, rather than alters, the mechanistic operation of the model.
Score: 53.66999416757543
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-tuning on generalized tasks such as instruction following, code generation, and mathematics has been shown to enhance language models' performance on a range of tasks. Nevertheless, explanations of how such fine-tuning influences the internal computations in these models remain elusive. We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, we explore the property of entity tracking, a crucial facet of language comprehension, where models fine-tuned on mathematics have substantial performance gains. We identify the mechanism that enables entity tracking and show that (i) in both the original model and its fine-tuned versions primarily the same circuit implements entity tracking. In fact, the entity tracking circuit of the original model on the fine-tuned versions performs better than the full original model. (ii) The circuits of all the models implement roughly the same functionality: Entity tracking is performed by tracking the position of the correct entity in both the original model and its fine-tuned versions. (iii) Performance boost in the fine-tuned models is primarily attributed to its improved ability to handle the augmented positional information. To uncover these findings, we employ: Patch Patching, DCM, which automatically detects model components responsible for specific semantics, and CMAP, a new approach for patching activations across models to reveal improved mechanisms. Our findings suggest that fine-tuning enhances, rather than fundamentally alters, the mechanistic operation of the model.

Related papers

Forgetting: A New Mechanism Towards Better Large Language Model Fine-tuning [53.398270878295754]
Supervised fine-tuning (SFT) plays a critical role for pretrained large language models (LLMs)<n>We suggest categorizing tokens within each corpus into two parts -- positive and negative tokens -- based on whether they are useful to improve model performance.<n>We conduct experiments on well-established benchmarks, finding that this forgetting mechanism not only improves overall model performance and also facilitate more diverse model responses.
arXiv Detail & Related papers (2025-08-06T11:22:23Z)
Chain and Causal Attention for Efficient Entity Tracking [46.577761606415805]
We propose an efficient and frugal enhancement to the standard attention mechanism. By considering attention as an adjacency matrix, our model can track entity states with a single layer. Our contributions include theoretical insights, an improved attention mechanism, and empirical validation.
arXiv Detail & Related papers (2024-10-07T23:54:10Z)
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z)
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains. We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z)
Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 100 publically available models. We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models. We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z)
Beyond Self-learned Attention: Mitigating Attention Bias in Transformer-based Models Using Attention Guidance [9.486558126032639]
We introduce SyntaGuid, a novel approach to guide Transformer-based models towards critical source code tokens. We show that SyntaGuid can improve overall performance up to 3.25% and fix up to 28.3% wrong predictions.
arXiv Detail & Related papers (2024-02-26T18:03:50Z)
Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach [25.927323251675386]
We leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models. We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models. Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models.
arXiv Detail & Related papers (2024-01-02T17:08:26Z)
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks [37.278707106871295]
We study how fine-tuning alters the underlying capabilities learned by a model during pretraining. We show that fine-tuning rarely alters the underlying model capabilities. We also show that fine-tuning can unintentionally remove a model's safety wrapper.
arXiv Detail & Related papers (2023-11-21T18:51:04Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Scaling Local Self-Attention For Parameter Efficient Visual Backbones [29.396052798583234]
Self-attention has the promise of improving computer vision systems due to parameter-independent scaling of receptive fields and content-dependent interactions. We develop a new self-attention model family, emphHaloNets, which reach state-of-the-art accuracies on the parameter-limited setting of the ImageNet classification benchmark.
arXiv Detail & Related papers (2021-03-23T17:56:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.