Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for
Hallucination Mitigation
- URL: http://arxiv.org/abs/2401.15449v1
- Date: Sat, 27 Jan 2024 16:19:30 GMT
- Title: Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for
Hallucination Mitigation
- Authors: Yuxin Liang, Zhuoyang Song, Hao Wang, Jiaxing Zhang
- Abstract summary: We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state.
We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs.
- Score: 9.730412606588335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We evaluate the ability of Large Language Models (LLMs) to discern and
express their internal knowledge state, a key factor in countering factual
hallucination and ensuring reliable application of LLMs. We observe a robust
self-awareness of internal knowledge state in LLMs, evidenced by over 85%
accuracy in knowledge probing. However, LLMs often fail to express their
internal knowledge during generation, leading to factual hallucinations. We
develop an automated hallucination annotation tool, Dreamcatcher, which merges
knowledge probing and consistency checking methods to rank factual preference
data. Using knowledge preference as reward, We propose a Reinforcement Learning
from Knowledge Feedback (RLKF) training framework, leveraging reinforcement
learning to enhance the factuality and honesty of LLMs. Our experiments across
multiple models show that RLKF training effectively enhances the ability of
models to utilize their internal knowledge state, boosting performance in a
variety of knowledge-based and honesty-related tasks.
Related papers
- Refine Knowledge of Large Language Models via Adaptive Contrastive Learning [54.61213933999464]
A mainstream category of methods is to reduce hallucinations by optimizing the knowledge representation of Large Language Models.
We believe that the process of models refining knowledge can greatly benefit from the way humans learn.
In our work, by imitating the human learning process, we design an Adaptive Contrastive Learning strategy.
arXiv Detail & Related papers (2025-02-11T02:19:13Z) - Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension [14.039653386385519]
Large language models (LLMs) acquire, retain, and apply knowledge.
This paper introduces a novel framework, K-(CSA)2, which categorizes LLM knowledge along two dimensions: correctness and confidence.
arXiv Detail & Related papers (2025-01-02T16:34:10Z) - KnowTuning: Knowledge-aware Fine-tuning for Large Language Models [83.5849717262019]
We propose a knowledge-aware fine-tuning (KnowTuning) method to improve fine-grained and coarse-grained knowledge awareness of LLMs.
KnowTuning generates more facts with less factual error rate under fine-grained facts evaluation.
arXiv Detail & Related papers (2024-02-17T02:54:32Z) - Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation [71.91287418249688]
Large language models (LLMs) often struggle with factual inaccuracies, even when they hold relevant knowledge.
We leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality.
We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks.
arXiv Detail & Related papers (2024-02-14T15:52:42Z) - Knowledge Verification to Nip Hallucination in the Bud [69.79051730580014]
We demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs.
We propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge.
We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales.
arXiv Detail & Related papers (2024-01-19T15:39:49Z) - RECALL: A Benchmark for LLMs Robustness against External Counterfactual
Knowledge [69.79676144482792]
This study aims to evaluate the ability of LLMs to distinguish reliable information from external knowledge.
Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context containing counterfactual information.
arXiv Detail & Related papers (2023-11-14T13:24:19Z) - User-Controlled Knowledge Fusion in Large Language Models: Balancing
Creativity and Hallucination [5.046007553593371]
Large Language Models (LLMs) generate diverse, relevant, and creative responses.
Striking a balance between the LLM's imaginative capabilities and its adherence to factual information is a key challenge.
This paper presents an innovative user-controllable mechanism that modulates the balance between an LLM's imaginative capabilities and its adherence to factual information.
arXiv Detail & Related papers (2023-07-30T06:06:35Z) - Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus.
We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.