Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization
- URL: http://arxiv.org/abs/2502.19127v2
- Date: Mon, 26 May 2025 07:35:49 GMT
- Title: Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization
- Authors: Siyuan Zhang, Yichi Zhang, Yinpeng Dong, Hang Su,
- Abstract summary: PKUE fine-tunes the model on self-generated responses to precise and simple factual questions.<n>Extensive experiments demonstrate that PKUE significantly improves LLM overall performance.
- Score: 37.59724553583446
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large Language Models (LLMs) often struggle to align their responses with objective facts, resulting in the issue of factual hallucinations, which can be difficult to detect and mislead users without relevant knowledge. Although post-training techniques have been employed to mitigate the issue, existing methods usually suffer from poor generalization and trade-offs in different capabilities. In this paper, we propose to address it by directly augmenting LLM's fundamental ability to precisely leverage its knowledge and introduce PKUE, which fine-tunes the model on self-generated responses to precise and simple factual questions through preference optimization. Furthermore, we construct FactualBench, a comprehensive and precise factual QA dataset containing 181k Chinese data spanning 21 domains, to facilitate both evaluation and training. Extensive experiments demonstrate that PKUE significantly improves LLM overall performance, with consistent enhancement across factual tasks of various forms, general tasks beyond factuality, and tasks in a different language.
Related papers
- Teaching Language Models To Gather Information Proactively [53.85419549904644]
Large language models (LLMs) are increasingly expected to function as collaborative partners.<n>In this work, we introduce a new task paradigm: proactive information gathering.<n>We design a scalable framework that generates partially specified, real-world tasks, masking key information.<n>Within this setup, our core innovation is a reinforcement finetuning strategy that rewards questions that elicit genuinely new, implicit user information.
arXiv Detail & Related papers (2025-07-28T23:50:09Z) - Enhancing LLM Knowledge Learning through Generalization [73.16975077770765]
We show that an LLM's ability to continually predict the same factual knowledge tokens given diverse paraphrased contexts is positively correlated with its capacity to extract that knowledge via question-answering.<n>We propose two strategies to enhance LLMs' ability to predict the same knowledge tokens given varied contexts, thereby enhancing knowledge acquisition.
arXiv Detail & Related papers (2025-03-05T17:56:20Z) - AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification [25.27444694706659]
We present AskToAct, which exploits structural mapping between queries and their tool invocation solutions.
Our key insight is that tool parameters naturally represent explicit user intents.
By systematically removing key parameters from queries while retaining them as ground truth, we enable automated construction of high-quality training data.
arXiv Detail & Related papers (2025-03-03T12:55:49Z) - Information Anxiety in Large Language Models [21.574677910096735]
Large Language Models (LLMs) have demonstrated strong performance as knowledge repositories.
We take the investigation further by conducting a comprehensive analysis of the internal reasoning and retrieval mechanisms of LLMs.
Our work focuses on three critical dimensions - the impact of entity popularity, the models' sensitivity to lexical variations in query formulation, and the progression of hidden state representations.
arXiv Detail & Related papers (2024-11-16T14:28:33Z) - What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models [11.453585039783901]
LEAF: Learning and Evaluation Augmented by Fact-Checking, is a novel approach designed to enhance the factual reliability of large language models (LLMs)
The first strategy, Fact-Check-Then-RAG, improves Retrieval-Augmented Generation (RAG) by incorporating fact-checking results to guide the retrieval process without updating model parameters.
The second strategy, Learning from Fact-Checks via Self-Training, involves supervised fine-tuning (SFT) on fact-checked responses or applying Simple Preference Optimization (SimPO) with fact-checking as a ranking mechanism.
arXiv Detail & Related papers (2024-10-31T00:18:05Z) - Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We introduce an extended concept of memorization, distributional memorization, which measures the correlation between the output probabilities and the pretraining data frequency.<n>We show that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks.
arXiv Detail & Related papers (2024-07-20T21:24:40Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Enhanced Language Model Truthfulness with Learnable Intervention and Uncertainty Expression [19.69104070561701]
Large language models (LLMs) can generate long-form and coherent text, yet they often hallucinate facts.
We propose LITO, a Learnable Intervention method for Truthfulness Optimization.
Experiments on multiple LLMs and question-answering datasets demonstrate that LITO improves truthfulness while preserving task accuracy.
arXiv Detail & Related papers (2024-05-01T03:50:09Z) - LLM In-Context Recall is Prompt Dependent [0.0]
A model's ability to do this significantly influences its practical efficacy and dependability in real-world applications.
This study demonstrates that an LLM's recall capability is not only contingent upon the prompt's content but also may be compromised by biases in its training data.
arXiv Detail & Related papers (2024-04-13T01:13:59Z) - KnowTuning: Knowledge-aware Fine-tuning for Large Language Models [83.5849717262019]
We propose a knowledge-aware fine-tuning (KnowTuning) method to improve fine-grained and coarse-grained knowledge awareness of LLMs.
KnowTuning generates more facts with less factual error rate under fine-grained facts evaluation.
arXiv Detail & Related papers (2024-02-17T02:54:32Z) - Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation [71.91287418249688]
Large language models (LLMs) often struggle with factual inaccuracies, even when they hold relevant knowledge.
We leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality.
We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks.
arXiv Detail & Related papers (2024-02-14T15:52:42Z) - From Understanding to Utilization: A Survey on Explainability for Large
Language Models [27.295767173801426]
This survey underscores the imperative for increased explainability in Large Language Models (LLMs)
Our focus is primarily on pre-trained Transformer-based LLMs, which pose distinctive interpretability challenges due to their scale and complexity.
When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement.
arXiv Detail & Related papers (2024-01-23T16:09:53Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - Context Matters: Data-Efficient Augmentation of Large Language Models
for Scientific Applications [15.893290942177112]
We explore the challenges inherent to Large Language Models (LLMs) like GPT-4.
The capacity of LLMs to present erroneous answers in a coherent and semantically rigorous manner complicates the detection of factual inaccuracies.
Our work aims to enhance the understanding and mitigation of such errors, thereby contributing to the improvement of LLM accuracy and reliability.
arXiv Detail & Related papers (2023-12-12T08:43:20Z) - Self-Knowledge Guided Retrieval Augmentation for Large Language Models [59.771098292611846]
Large language models (LLMs) have shown superior performance without task-specific fine-tuning.
Retrieval-based methods can offer non-parametric world knowledge and improve the performance on tasks such as question answering.
Self-Knowledge guided Retrieval augmentation (SKR) is a simple yet effective method which can let LLMs refer to the questions they have previously encountered.
arXiv Detail & Related papers (2023-10-08T04:22:33Z) - When Not to Trust Language Models: Investigating Effectiveness of
Parametric and Non-Parametric Memories [58.3421305091187]
This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge.
We find that LMs struggle with less popular factual knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the long tail.
We devise a simple, yet effective, method for powerful and efficient retrieval-augmented LMs, which retrieves non-parametric memories only when necessary.
arXiv Detail & Related papers (2022-12-20T18:30:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.