Related papers: Mechanistic Fine-tuning for In-context Learning

Mechanistic Fine-tuning for In-context Learning

URL: http://arxiv.org/abs/2505.14233v1
Date: Tue, 20 May 2025 11:41:21 GMT
Title: Mechanistic Fine-tuning for In-context Learning
Authors: Hakaze Cho, Peng Luo, Mariko Kato, Rin Kaenbyou, Naoya Inoue,
Abstract summary: In-context learning (ICL) induces few-shot learning on Language Models (LMs) not originally pre-trained on ICL-style data.<n>To bridge the gap between ICL and pre-training, some approaches fine-tune LMs on large ICL-style datasets by an end-to-end paradigm with massive computational costs.<n>We propose Attention Behavior Fine-Tuning (ABFT) to force the attention scores to focus on the correct label tokens instead of the final outputs.
Score: 3.8645776186425755
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In-context Learning (ICL) utilizes structured demonstration-query inputs to induce few-shot learning on Language Models (LMs), which are not originally pre-trained on ICL-style data. To bridge the gap between ICL and pre-training, some approaches fine-tune LMs on large ICL-style datasets by an end-to-end paradigm with massive computational costs. To reduce such costs, in this paper, we propose Attention Behavior Fine-Tuning (ABFT), utilizing the previous findings on the inner mechanism of ICL, building training objectives on the attention scores instead of the final outputs, to force the attention scores to focus on the correct label tokens presented in the context and mitigate attention scores from the wrong label tokens. Our experiments on 9 modern LMs and 8 datasets empirically find that ABFT outperforms in performance, robustness, unbiasedness, and efficiency, with only around 0.01% data cost compared to the previous methods. Moreover, our subsequent analysis finds that the end-to-end training objective contains the ABFT objective, suggesting the implicit bias of ICL-style data to the emergence of induction heads. Our work demonstrates the possibility of controlling specific module sequences within LMs to improve their behavior, opening up the future application of mechanistic interpretability.

Related papers

Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning [48.67380502157004]
Large-scale Transformer language models (LMs) trained solely on next-token prediction with web-scale data can solve a wide range of tasks.<n>The mechanism behind this capability, known as in-context learning (ICL), remains both controversial and poorly understood.
arXiv Detail & Related papers (2025-05-16T08:50:42Z)
LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks [23.5632914682956]
Large language model unlearning has become a critical challenge in ensuring safety and controlled model behavior.<n>We show that LLM unlearning can be effectively maintained using a significantly smaller subset (functioning as a "coreset")<n>This suggests that LLM unlearning in these benchmarks can be performed surprisingly easily, even in an extremely low-data regime.
arXiv Detail & Related papers (2025-04-14T12:38:37Z)
Don't Take Things Out of Context: Attention Intervention for Enhancing Chain-of-Thought Reasoning in Large Language Models [32.71672086718058]
Few-shot Chain-of-Thought (CoT) significantly enhances the reasoning capabilities of large language models (LLMs)<n>We observe that isolated segments, words, or tokens within CoT demonstrations can unexpectedly disrupt the generation process of LLMs.<n>We propose a Few-shot Attention Intervention method (FAI) that dynamically analyzes the attention patterns of demonstrations to accurately identify these tokens.
arXiv Detail & Related papers (2025-03-14T07:46:33Z)
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency.<n>UPFT removes the need for labeled data or exhaustive sampling.<n> Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z)
Fine-Tuning Language Models on Multiple Datasets for Citation Intention Classification [17.03832781104098]
Citation intention Classification (CIC) tools classify citations by their intention. Prior research has shown that pretrained language models (PLMs) can achieve state-of-the-art performance on CIC benchmarks. We propose a multi-task learning framework that jointly fine-tunes PLMs on a dataset of primary interest together with multiple auxiliary CIC datasets.
arXiv Detail & Related papers (2024-10-17T08:45:02Z)
What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights [67.72413262980272]
Severe data imbalance naturally exists among web-scale vision-language datasets. We find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning. The robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts.
arXiv Detail & Related papers (2024-05-31T17:57:24Z)
Feature-Adaptive and Data-Scalable In-Context Learning [36.01997148676005]
FADS-ICL is a feature-adaptive and data-scalable in-context learning framework. It can leverage task-adaptive features to promote inference on the downstream task. FADS-ICL consistently outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2024-05-17T12:32:53Z)
C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z)
Improving Input-label Mapping with Demonstration Replay for In-context Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models. We propose a novel ICL method called Sliding Causal Attention (RdSca) We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z)
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)
Understanding In-Context Learning via Supportive Pretraining Data [55.648777340129364]
In-context learning (ICL) improves language models' performance on a variety of NLP tasks by simply demonstrating a handful of examples at inference time. It is not well understood why ICL ability emerges, as the model has never been specifically trained on such demonstrations. Our work takes a first step towards understanding ICL via analyzing instance-level pretraining data.
arXiv Detail & Related papers (2023-06-26T22:14:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.