Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for
Hallucination Mitigation
- URL: http://arxiv.org/abs/2401.15449v1
- Date: Sat, 27 Jan 2024 16:19:30 GMT
- Title: Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for
Hallucination Mitigation
- Authors: Yuxin Liang, Zhuoyang Song, Hao Wang, Jiaxing Zhang
- Abstract summary: We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state.
We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs.
- Score: 9.730412606588335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We evaluate the ability of Large Language Models (LLMs) to discern and
express their internal knowledge state, a key factor in countering factual
hallucination and ensuring reliable application of LLMs. We observe a robust
self-awareness of internal knowledge state in LLMs, evidenced by over 85%
accuracy in knowledge probing. However, LLMs often fail to express their
internal knowledge during generation, leading to factual hallucinations. We
develop an automated hallucination annotation tool, Dreamcatcher, which merges
knowledge probing and consistency checking methods to rank factual preference
data. Using knowledge preference as reward, We propose a Reinforcement Learning
from Knowledge Feedback (RLKF) training framework, leveraging reinforcement
learning to enhance the factuality and honesty of LLMs. Our experiments across
multiple models show that RLKF training effectively enhances the ability of
models to utilize their internal knowledge state, boosting performance in a
variety of knowledge-based and honesty-related tasks.
Related papers
- Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching [67.11497198002165]
Large language models (LLMs) often struggle to provide up-to-date information due to their one-time training.
Motivated by the remarkable success of the Feynman Technique in efficient human learning, we introduce Self-Tuning.
arXiv Detail & Related papers (2024-06-10T14:42:20Z) - Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback [14.120154004011084]
Large Language Models (LLMs) often generate erroneous outputs, known as hallucinations.
We present a novel alignment framework called Reinforcement Learning from Knowledge Feedback (RLKF)
arXiv Detail & Related papers (2024-03-27T08:39:56Z) - KnowTuning: Knowledge-aware Fine-tuning for Large Language Models [83.5849717262019]
We propose a knowledge-aware fine-tuning (KnowTuning) method to improve fine-grained and coarse-grained knowledge awareness of LLMs.
KnowTuning generates more facts with less factual error rate under fine-grained facts evaluation.
arXiv Detail & Related papers (2024-02-17T02:54:32Z) - Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation [71.91287418249688]
Large language models (LLMs) often struggle with factual inaccuracies, even when they hold relevant knowledge.
We leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality.
We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks.
arXiv Detail & Related papers (2024-02-14T15:52:42Z) - The Calibration Gap between Model and Human Confidence in Large Language
Models [14.539888672603743]
Large language models (LLMs) need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct.
Recent work has focused on the quality of internal LLM confidence assessments.
This paper explores the disparity between external human confidence in an LLM's responses and the internal confidence of the model.
arXiv Detail & Related papers (2024-01-24T22:21:04Z) - Knowledge Verification to Nip Hallucination in the Bud [69.79051730580014]
We demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs.
We propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge.
We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales.
arXiv Detail & Related papers (2024-01-19T15:39:49Z) - RECALL: A Benchmark for LLMs Robustness against External Counterfactual
Knowledge [69.79676144482792]
This study aims to evaluate the ability of LLMs to distinguish reliable information from external knowledge.
Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context containing counterfactual information.
arXiv Detail & Related papers (2023-11-14T13:24:19Z) - User-Controlled Knowledge Fusion in Large Language Models: Balancing
Creativity and Hallucination [5.046007553593371]
Large Language Models (LLMs) generate diverse, relevant, and creative responses.
Striking a balance between the LLM's imaginative capabilities and its adherence to factual information is a key challenge.
This paper presents an innovative user-controllable mechanism that modulates the balance between an LLM's imaginative capabilities and its adherence to factual information.
arXiv Detail & Related papers (2023-07-30T06:06:35Z) - Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus.
We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.