Related papers: The Curious Case of Factuality Finetuning: Models' Internal Beliefs Can Improve Factuality

The Curious Case of Factuality Finetuning: Models' Internal Beliefs Can Improve Factuality

URL: http://arxiv.org/abs/2507.08371v1
Date: Fri, 11 Jul 2025 07:34:34 GMT
Title: The Curious Case of Factuality Finetuning: Models' Internal Beliefs Can Improve Factuality
Authors: Benjamin Newman, Abhilasha Ravichander, Jaehun Jung, Rui Xin, Hamish Ivison, Yegor Kuznetsov, Pang Wei Koh, Yejin Choi,
Abstract summary: We study the relationship between the factuality of finetuning data and the prevalence of hallucinations in long-form generation tasks.<n>We find that finetuning on factual gold data is not as helpful as finetuning on model-generated data that models believe to be factual.
Score: 47.61600392927893
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models are prone to hallucination - generating text that is factually incorrect. Finetuning models on high-quality factual information can potentially reduce hallucination, but concerns remain; obtaining factual gold data can be expensive and training on correct but unfamiliar data may potentially lead to even more downstream hallucination. What data should practitioners finetune on to mitigate hallucinations in language models? In this work, we study the relationship between the factuality of finetuning data and the prevalence of hallucinations in long-form generation tasks. Counterintuitively, we find that finetuning on factual gold data is not as helpful as finetuning on model-generated data that models believe to be factual. Next, we evaluate filtering strategies applied on both factual gold data and model-generated data, and find that finetuning on model-generated data that is filtered by models' own internal judgments often leads to better overall factuality compared to other configurations: training on gold data filtered by models' judgments, training on gold data alone, or training on model-generated data that is supported by gold data. These factuality improvements transfer across three domains we study, suggesting that a models' own beliefs can provide a powerful signal for factuality.

Related papers

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains [50.66245575710432]
We show that paired preference data consisting of individually weak data points can enable gains beyond the strength of each individual data point.<n>Our work shows that models can learn surprisingly well from paired data that might typically be considered weak.
arXiv Detail & Related papers (2025-07-08T17:14:44Z)
Exploring the Knowledge Mismatch Hypothesis: Hallucination Propensity in Small Models Fine-tuned on Data from Larger Models [0.1227734309612871]
Fine-tuning language models with data from larger models can appear similar, but they hallucinate more often than larger models. One hypothesis is that fine-tuning a model on data produced by a larger model leads to a knowledge mismatch which contributes to hallucination. We show that on an unseen test set, a smaller model fine-tuned on data generated from a larger model produced more wrong answers when compared to models fine-tuned on data created by the small model.
arXiv Detail & Related papers (2024-10-31T13:01:46Z)
Transcendence: Generative Models Can Outperform The Experts That Train Them [55.885802048647655]
We study the phenomenon of transcendence: when a generative model achieves capabilities that surpass the abilities of the experts generating its data. We demonstrate transcendence by training an autoregressive transformer to play chess from game transcripts, and show that the trained model can sometimes achieve better performance than all players in the dataset.
arXiv Detail & Related papers (2024-06-17T17:00:52Z)
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification [11.6055501181235]
We investigate the use of verification on synthesized data to prevent model collapse. We show that verifiers, even imperfect ones, can indeed be harnessed to prevent model collapse.
arXiv Detail & Related papers (2024-06-11T17:46:16Z)
Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models. This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution. We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z)
Mind the Gap Between Synthetic and Real: Utilizing Transfer Learning to Probe the Boundaries of Stable Diffusion Generated Data [2.6016285265085526]
Student models show a significant drop in accuracy compared to models trained on real data. By training these layers using either real or synthetic data, we reveal that the drop mainly stems from the model's final layers. Our results suggest an improved trade-off between the amount of real training data used and the model's accuracy.
arXiv Detail & Related papers (2024-05-06T07:51:13Z)
Unfamiliar Finetuning Examples Control How Language Models Hallucinate [75.03210107477157]
Large language models are known to hallucinate when faced with unfamiliar queries. We find that unfamiliar examples in the models' finetuning data are crucial in shaping these errors. Our work further investigates RL finetuning strategies for improving the factuality of long-form model generations.
arXiv Detail & Related papers (2024-03-08T18:28:13Z)
Calibrated Language Models Must Hallucinate [11.891340760198798]
Recent language models generate false but plausible-sounding text with surprising frequency. This work shows that there is an inherent statistical lower-bound on the rate that pretrained language models hallucinate certain types of facts. For "arbitrary" facts whose veracity cannot be determined from the training data, we show that hallucinations must occur at a certain rate for language models.
arXiv Detail & Related papers (2023-11-24T18:29:50Z)
Orthogonal Uncertainty Representation of Data Manifold for Robust Long-Tailed Learning [52.021899899683675]
In scenarios with long-tailed distributions, the model's ability to identify tail classes is limited due to the under-representation of tail samples. We propose an Orthogonal Uncertainty Representation (OUR) of feature embedding and an end-to-end training strategy to improve the long-tail phenomenon of model robustness.
arXiv Detail & Related papers (2023-10-16T05:50:34Z)
On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets. We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough. We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z)
Overwriting Pretrained Bias with Finetuning Data [36.050345384273655]
We investigate bias when conceptualized as both spurious correlations between the target task and a sensitive attribute as well as underrepresentation of a particular group in the dataset. We find that models finetuned on top of pretrained models can indeed inherit their biases, but (2) this bias can be corrected for through relatively minor interventions to the finetuning dataset. Our findings imply that careful curation of the finetuning dataset is important for reducing biases on a downstream task, and doing so can even compensate for bias in the pretrained model.
arXiv Detail & Related papers (2023-03-10T19:10:58Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [47.432215933099016]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.<n>This creates a barrier to fusing knowledge across individual models to yield a better single model.<n>We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.