Related papers: Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

URL: http://arxiv.org/abs/2405.05904v2
Date: Mon, 13 May 2024 07:29:58 GMT
Title: Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Authors: Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Amir Feder, Roi Reichart, Jonathan Herzig,
Abstract summary: We study the impact of new knowledge on the capability of the fine-tuned model to utilize its pre-existing knowledge. We demonstrate that large language models struggle to acquire new factual knowledge through fine-tuning. As the examples with new knowledge are eventually learned, they linearly increase the model's tendency to hallucinate.
Score: 33.702498916775426
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When large language models are aligned via supervised fine-tuning, they may encounter new factual information that was not acquired through pre-training. It is often conjectured that this can teach the model the behavior of hallucinating factually incorrect responses, as the model is trained to generate facts that are not grounded in its pre-existing knowledge. In this work, we study the impact of such exposure to new knowledge on the capability of the fine-tuned model to utilize its pre-existing knowledge. To this end, we design a controlled setup, focused on closed-book QA, where we vary the proportion of the fine-tuning examples that introduce new knowledge. We demonstrate that large language models struggle to acquire new factual knowledge through fine-tuning, as fine-tuning examples that introduce new knowledge are learned significantly slower than those consistent with the model's knowledge. However, we also find that as the examples with new knowledge are eventually learned, they linearly increase the model's tendency to hallucinate. Taken together, our results highlight the risk in introducing new factual knowledge through fine-tuning, and support the view that large language models mostly acquire factual knowledge through pre-training, whereas fine-tuning teaches them to use it more efficiently.

Related papers

How new data permeates LLM knowledge and how to dilute it [19.96863816288517]
Large language models learn and continually learn through the accumulation of gradient-based updates. We demonstrate that when learning new information, LLMs exhibit a "priming" effect: learning a new fact can cause the model to inappropriately apply that knowledge in unrelated contexts. We show that the degree of priming after learning new information can be predicted by measuring the token probability of key words before learning.
arXiv Detail & Related papers (2025-04-13T11:25:04Z)
The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination [85.18584652829799]
We introduce a novel framework to quantify factual hallucinations by modeling knowledge overshadowing. We propose a new decoding strategy CoDa, to mitigate hallucinations, which notably enhance model factuality on Overshadow (27.9%), MemoTrap (13.1%) and NQ-Swap (18.3%)
arXiv Detail & Related papers (2025-02-22T08:36:06Z)
Learning and Unlearning of Fabricated Knowledge in Language Models [16.971082623826263]
We show that facts that conflict with common knowledge are remembered for tens of thousands of training steps. We show that impacts of knowledge-conflicting facts in LMs, though they can be long lasting, can be largely erased by novel application of multi-step sparse updates.
arXiv Detail & Related papers (2024-10-29T05:33:14Z)
Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase. In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training. We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z)
Large Scale Knowledge Washing [24.533316191149677]
Large language models show impressive abilities in memorizing world knowledge. We introduce the problem of Large Scale Knowledge Washing, focusing on unlearning an extensive amount of factual knowledge.
arXiv Detail & Related papers (2024-05-26T23:29:49Z)
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models [53.52344131257681]
We propose a new paradigm for fine-tuning called F-Learning, which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge. Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2023-11-14T09:12:40Z)
The Effect of Masking Strategies on Knowledge Retention by Language Models [9.130890741447422]
This paper aims to understand the effect of pre-training tasks on the amount of knowledge captured and forgotten by language models. We test the model's knowledge retention by measuring its ability to answer factual questions. Our findings demonstrate that, like the ability to perform a task, the knowledge acquired from being trained on that task is forgotten when a model is trained to perform another task.
arXiv Detail & Related papers (2023-06-12T15:35:23Z)
Mitigating Temporal Misalignment by Discarding Outdated Facts [58.620269228776294]
Large language models are often used under temporal misalignment, tasked with answering questions about the present. We propose fact duration prediction: the task of predicting how long a given fact will remain true. Our data and code are released publicly at https://github.com/mikejqzhang/mitigating_misalignment.
arXiv Detail & Related papers (2023-05-24T07:30:08Z)
Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts) We find that existing methods for updating knowledge show little propagation of injected knowledge. Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z)
Adaptively Integrated Knowledge Distillation and Prediction Uncertainty for Continual Learning [71.43841235954453]
Current deep learning models often suffer from catastrophic forgetting of old knowledge when continually learning new knowledge. Existing strategies to alleviate this issue often fix the trade-off between keeping old knowledge (stability) and learning new knowledge (plasticity)
arXiv Detail & Related papers (2023-01-18T05:36:06Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.