Exploring the Knowledge Mismatch Hypothesis: Hallucination Propensity in Small Models Fine-tuned on Data from Larger Models
- URL: http://arxiv.org/abs/2411.00878v1
- Date: Thu, 31 Oct 2024 13:01:46 GMT
- Title: Exploring the Knowledge Mismatch Hypothesis: Hallucination Propensity in Small Models Fine-tuned on Data from Larger Models
- Authors: Phil Wee, Riyadh Baghdadi,
- Abstract summary: Fine-tuning language models with data from larger models can appear similar, but they hallucinate more often than larger models.
One hypothesis is that fine-tuning a model on data produced by a larger model leads to a knowledge mismatch which contributes to hallucination.
We show that on an unseen test set, a smaller model fine-tuned on data generated from a larger model produced more wrong answers when compared to models fine-tuned on data created by the small model.
- Score: 0.1227734309612871
- License:
- Abstract: Recently, there has been an explosion of large language models created through fine-tuning with data from larger models. These small models able to produce outputs that appear qualitatively similar to significantly larger models. However, one of the key limitations that have been observed with these models is their propensity to hallucinate significantly more often than larger models. In particular, they have been observed to generate coherent outputs that involve factually incorrect information and spread misinformation, toxicity, and stereotypes. There are many potential causes of hallucination, of which, one hypothesis is that fine-tuning a model on data produced by a larger model leads to a knowledge mismatch which contributes to hallucination. In particular, it is hypothesized that there is a mismatch between the knowledge that is fed to the model to fine-tune it and the knowledge that is already present in the graph. Fine-tuning the model on data that has such mismatch could contribute to an increased propensity to hallucinate. We show that on an unseen test set, a smaller model fine-tuned on data generated from a larger model produced more wrong answers when compared to models fine-tuned on data created by the small model, which confirms the hypothesis.
Related papers
- Strong Model Collapse [16.071600606637908]
We consider a supervised regression setting and establish the existance of a strong form of the model collapse phenomenon.
Our results show that even the smallest fraction of synthetic data can lead to model collapse.
We investigate whether increasing model size, an approach aligned with current trends in training large language models, exacerbates or mitigates model collapse.
arXiv Detail & Related papers (2024-10-07T08:54:23Z) - Graphical Modelling without Independence Assumptions for Uncentered Data [0.30723404270319693]
We show how the zero-mean assumption can cause egregious errors in modelling.
Specifically, we propose a relaxation of the zero-mean assumption that allows the avoidance of such errors.
arXiv Detail & Related papers (2024-08-05T11:40:23Z) - Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models [65.32990889402927]
We coin this phenomenon as knowledge overshadowing''
We show that the hallucination rate grows with both the imbalance ratio and the length of dominant condition description.
We propose to utilize overshadowing conditions as a signal to catch hallucination before it is produced.
arXiv Detail & Related papers (2024-07-10T20:37:42Z) - Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data [49.73114504515852]
We show that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse.
We demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse.
arXiv Detail & Related papers (2024-04-01T18:31:24Z) - Unfamiliar Finetuning Examples Control How Language Models Hallucinate [75.03210107477157]
Large language models are known to hallucinate when faced with unfamiliar queries.
We find that unfamiliar examples in the models' finetuning data are crucial in shaping these errors.
Our work further investigates RL finetuning strategies for improving the factuality of long-form model generations.
arXiv Detail & Related papers (2024-03-08T18:28:13Z) - Analyzing Bias in Diffusion-based Face Generation Models [75.80072686374564]
Diffusion models are increasingly popular in synthetic data generation and image editing applications.
We investigate the presence of bias in diffusion-based face generation models with respect to attributes such as gender, race, and age.
We examine how dataset size affects the attribute composition and perceptual quality of both diffusion and Generative Adversarial Network (GAN) based face generation models.
arXiv Detail & Related papers (2023-05-10T18:22:31Z) - Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures [93.17009514112702]
Pruning, setting a significant subset of the parameters of a neural network to zero, is one of the most popular methods of model compression.
Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood.
arXiv Detail & Related papers (2023-04-25T07:42:06Z) - Causal discovery for observational sciences using supervised machine
learning [1.6631602844999722]
Causal inference can estimate causal effects, but unless data are collected experimentally, statistical analyses must rely on pre-specified causal models.
Severalally correct methods already exist, but they generally struggle on smaller samples.
Most methods focus on very sparse causal models, which may not always be a realistic representation of real-life data generating mechanisms.
We propose a new causal discovery method that addresses these three shortcomings: Supervised learning discovery (SLdisco)
arXiv Detail & Related papers (2022-02-25T16:44:00Z) - Predicting on the Edge: Identifying Where a Larger Model Does Better [61.793778186198864]
We show that large models have the largest improvement on examples where the small model is most uncertain.
We show that a switcher model which defers examples to a larger model when a small model is uncertain can achieve striking improvements in performance and resource usage.
arXiv Detail & Related papers (2022-02-15T18:53:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.