Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
- URL: http://arxiv.org/abs/2405.05904v2
- Date: Mon, 13 May 2024 07:29:58 GMT
- Title: Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
- Authors: Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Amir Feder, Roi Reichart, Jonathan Herzig,
- Abstract summary: We study the impact of new knowledge on the capability of the fine-tuned model to utilize its pre-existing knowledge.
We demonstrate that large language models struggle to acquire new factual knowledge through fine-tuning.
As the examples with new knowledge are eventually learned, they linearly increase the model's tendency to hallucinate.
- Score: 33.702498916775426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When large language models are aligned via supervised fine-tuning, they may encounter new factual information that was not acquired through pre-training. It is often conjectured that this can teach the model the behavior of hallucinating factually incorrect responses, as the model is trained to generate facts that are not grounded in its pre-existing knowledge. In this work, we study the impact of such exposure to new knowledge on the capability of the fine-tuned model to utilize its pre-existing knowledge. To this end, we design a controlled setup, focused on closed-book QA, where we vary the proportion of the fine-tuning examples that introduce new knowledge. We demonstrate that large language models struggle to acquire new factual knowledge through fine-tuning, as fine-tuning examples that introduce new knowledge are learned significantly slower than those consistent with the model's knowledge. However, we also find that as the examples with new knowledge are eventually learned, they linearly increase the model's tendency to hallucinate. Taken together, our results highlight the risk in introducing new factual knowledge through fine-tuning, and support the view that large language models mostly acquire factual knowledge through pre-training, whereas fine-tuning teaches them to use it more efficiently.
Related papers
- Large Scale Knowledge Washing [24.533316191149677]
Large language models show impressive abilities in memorizing world knowledge.
We introduce the problem of Large Scale Knowledge Washing, focusing on unlearning an extensive amount of factual knowledge.
arXiv Detail & Related papers (2024-05-26T23:29:49Z) - Unfamiliar Finetuning Examples Control How Language Models Hallucinate [75.03210107477157]
Large language models are known to hallucinate when faced with unfamiliar queries.
We find that unfamiliar examples in the models' finetuning data are crucial in shaping these errors.
Our work further investigates RL finetuning strategies for improving the factuality of long-form model generations.
arXiv Detail & Related papers (2024-03-08T18:28:13Z) - Forgetting before Learning: Utilizing Parametric Arithmetic for
Knowledge Updating in Large Language Models [53.52344131257681]
We propose a new paradigm for fine-tuning called F-Learning, which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge.
Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2023-11-14T09:12:40Z) - The Effect of Masking Strategies on Knowledge Retention by Language
Models [9.130890741447422]
This paper aims to understand the effect of pre-training tasks on the amount of knowledge captured and forgotten by language models.
We test the model's knowledge retention by measuring its ability to answer factual questions.
Our findings demonstrate that, like the ability to perform a task, the knowledge acquired from being trained on that task is forgotten when a model is trained to perform another task.
arXiv Detail & Related papers (2023-06-12T15:35:23Z) - Mitigating Temporal Misalignment by Discarding Outdated Facts [58.620269228776294]
Large language models are often used under temporal misalignment, tasked with answering questions about the present.
We propose fact duration prediction: the task of predicting how long a given fact will remain true.
Our data and code are released publicly at https://github.com/mikejqzhang/mitigating_misalignment.
arXiv Detail & Related papers (2023-05-24T07:30:08Z) - Can LMs Learn New Entities from Descriptions? Challenges in Propagating
Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts)
We find that existing methods for updating knowledge show little propagation of injected knowledge.
Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z) - Adaptively Integrated Knowledge Distillation and Prediction Uncertainty
for Continual Learning [71.43841235954453]
Current deep learning models often suffer from catastrophic forgetting of old knowledge when continually learning new knowledge.
Existing strategies to alleviate this issue often fix the trade-off between keeping old knowledge (stability) and learning new knowledge (plasticity)
arXiv Detail & Related papers (2023-01-18T05:36:06Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Eliciting Knowledge from Large Pre-Trained Models for Unsupervised
Knowledge-Grounded Conversation [45.95864432188745]
Recent advances in large-scale pre-training provide large models with the potential to learn knowledge from the raw text.
We propose various methods that best elicit knowledge from large models.
Our human study indicates that, though hallucinations exist, large models post the unique advantage of being able to output common sense.
arXiv Detail & Related papers (2022-11-03T04:48:38Z) - Facts as Experts: Adaptable and Interpretable Neural Memory over
Symbolic Knowledge [38.48518306055536]
We develop a neural language model that includes an explicit interface between symbolically interpretable factual information and subsymbolic neural knowledge.
We show that this model dramatically improves performance on two knowledge-intensive question-answering tasks.
arXiv Detail & Related papers (2020-07-02T03:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.