Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels
- URL: http://arxiv.org/abs/2509.16596v1
- Date: Sat, 20 Sep 2025 09:40:32 GMT
- Title: Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels
- Authors: Junjie Ye, Yuming Yang, Yang Nan, Shuo Li, Qi Zhang, Tao Gui, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan,
- Abstract summary: Large language models (LLMs) acquire substantial world knowledge during pre-training.<n>Post-training techniques such as supervised fine-tuning (SFT) shape this knowledge change behavior.<n>We evaluate closed-book question answering (CBQA) performance across five LLMs from the LLaMA-2 and LLaMA-3 families.
- Score: 72.3670919950349
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) acquire substantial world knowledge during pre-training, which is further shaped by post-training techniques such as supervised fine-tuning (SFT). However, the impact of SFT on a model's knowledge remains underexplored, limiting our ability to control knowledge change behavior in fine-tuned models. To address this gap, we evaluate closed-book question answering (CBQA) performance across five LLMs from the LLaMA-2 and LLaMA-3 families. Surprisingly, models fine-tuned on 1,920 samples perform up to 14% worse than those fine-tuned on only 240 samples. Furthermore, varying the level of knowledge mastery in the fine-tuning data leads to performance fluctuations of over 12%. To investigate these effects, we analyze model behavior at both the token and parameter levels. Our analysis reveals that up to 90% of parameter updates during SFT do not contribute to knowledge enhancement. Restoring these updates can improve performance on the CBQA task, depending on the characteristics of the fine-tuning data. These insights offer practical guidance for developing fine-tuning strategies that more effectively strengthen model knowledge.
Related papers
- Exploring Fine-Tuning for Tabular Foundation Models [3.884856136722027]
This work presents the first comprehensive study of fine-tuning in Tabular Foundation Models (TFMs)<n>We compare Zero-Shot, Meta-Learning, Supervised (SFT), and parameter-efficient (PEFT) approaches, analyzing how dataset factors such as imbalance, size, and dimensionality affect outcomes.<n>Our findings cover performance, calibration, and fairness, offering practical guidelines on when fine-tuning is most beneficial and its limitations.
arXiv Detail & Related papers (2026-01-14T17:40:46Z) - Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality [10.74213785908381]
Supervised fine-tuning (SFT) is a critical step in aligning large language models with human instructions and values.<n>We trained a wide range of base models on a variety of datasets including code generation, mathematical reasoning, and general-domain tasks.<n>We will release these 1,000+ SFT models and benchmark results to accelerate further research.
arXiv Detail & Related papers (2025-06-17T16:13:15Z) - Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models.<n>We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z) - CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation [57.91828170220308]
We propose a knowledge distillation approach, CustomKD, that effectively leverages large vision foundation models (LVFMs) to enhance the performance of edge models.<n>Our simple yet effective CustomKD customizes the well-generalized features inherent in LVFMs to a given student model in order to reduce model discrepancies.
arXiv Detail & Related papers (2025-03-23T23:53:08Z) - Dissecting Fine-Tuning Unlearning in Large Language Models [12.749301272512222]
Fine-tuning-based unlearning methods prevail for preventing harmful, sensitive, or copyrighted information within large language models.
However, the true effectiveness of these methods is unclear.
In this work, we delve into the limitations of fine-tuning-based unlearning through activation patching and restoration experiments.
arXiv Detail & Related papers (2024-10-09T06:58:09Z) - Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase.
In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training.
We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z) - Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning [13.371405067535814]
This paper investigates the effectiveness ofSupervised Fine-Tuning (SFT) as a method for knowledge injection in Large Language Models (LLMs)
We compare different dataset generation strategies -- token-based and fact-based scaling -- to create training data that helps the model learn new information.
Our results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge.
arXiv Detail & Related papers (2024-03-30T01:56:07Z) - Overcoming Generic Knowledge Loss with Selective Parameter Update [48.240683797965005]
We propose a novel approach to continuously update foundation models.
Instead of updating all parameters equally, we localize the updates to a sparse set of parameters relevant to the task being learned.
Our method achieves improvements on the accuracy of the newly learned tasks up to 7% while preserving the pretraining knowledge with a negligible decrease of 0.9% on a representative control set accuracy.
arXiv Detail & Related papers (2023-08-23T22:55:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.