Related papers: Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

URL: http://arxiv.org/abs/2401.15449v1
Date: Sat, 27 Jan 2024 16:19:30 GMT
Title: Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation
Authors: Yuxin Liang, Zhuoyang Song, Hao Wang, Jiaxing Zhang
Abstract summary: We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state. We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs.
Score: 9.730412606588335
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% accuracy in knowledge probing. However, LLMs often fail to express their internal knowledge during generation, leading to factual hallucinations. We develop an automated hallucination annotation tool, Dreamcatcher, which merges knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as reward, We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs. Our experiments across multiple models show that RLKF training effectively enhances the ability of models to utilize their internal knowledge state, boosting performance in a variety of knowledge-based and honesty-related tasks.

Related papers

Line of Duty: Evaluating LLM Self-Knowledge via Consistency in Feasibility Boundaries [0.0]
This study aims to obtain intrinsic insights into different types of LLM self-knowledge with a novel methodology. We find that even frontier models like GPT-4o and Mistral Large are not sure of their own capabilities more than 80% of the time.
arXiv Detail & Related papers (2025-03-14T10:07:07Z)
Effective LLM Knowledge Learning via Model Generalization [73.16975077770765]
Large language models (LLMs) are trained on enormous documents that contain extensive world knowledge. It is still not well-understood how knowledge is acquired via autoregressive pre-training. In this paper, we focus on understanding and improving LLM knowledge learning.
arXiv Detail & Related papers (2025-03-05T17:56:20Z)
Refine Knowledge of Large Language Models via Adaptive Contrastive Learning [54.61213933999464]
A mainstream category of methods is to reduce hallucinations by optimizing the knowledge representation of Large Language Models. We believe that the process of models refining knowledge can greatly benefit from the way humans learn. In our work, by imitating the human learning process, we design an Adaptive Contrastive Learning strategy.
arXiv Detail & Related papers (2025-02-11T02:19:13Z)
Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension [14.039653386385519]
Large language models (LLMs) acquire, retain, and apply knowledge. This paper introduces a novel framework, K-(CSA)2, which categorizes LLM knowledge along two dimensions: correctness and confidence.
arXiv Detail & Related papers (2025-01-02T16:34:10Z)
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching [67.11497198002165]
Large language models (LLMs) often struggle to provide up-to-date information due to their one-time training. Motivated by the remarkable success of the Feynman Technique in efficient human learning, we introduce Self-Tuning.
arXiv Detail & Related papers (2024-06-10T14:42:20Z)
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback [14.120154004011084]
Large Language Models (LLMs) often generate erroneous outputs, known as hallucinations. We present a novel alignment framework called Reinforcement Learning from Knowledge Feedback (RLKF)
arXiv Detail & Related papers (2024-03-27T08:39:56Z)
KnowTuning: Knowledge-aware Fine-tuning for Large Language Models [83.5849717262019]
We propose a knowledge-aware fine-tuning (KnowTuning) method to improve fine-grained and coarse-grained knowledge awareness of LLMs. KnowTuning generates more facts with less factual error rate under fine-grained facts evaluation.
arXiv Detail & Related papers (2024-02-17T02:54:32Z)
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation [71.91287418249688]
Large language models (LLMs) often struggle with factual inaccuracies, even when they hold relevant knowledge. We leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks.
arXiv Detail & Related papers (2024-02-14T15:52:42Z)
The Calibration Gap between Model and Human Confidence in Large Language Models [14.539888672603743]
Large language models (LLMs) need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct. Recent work has focused on the quality of internal LLM confidence assessments. This paper explores the disparity between external human confidence in an LLM's responses and the internal confidence of the model.
arXiv Detail & Related papers (2024-01-24T22:21:04Z)
Knowledge Verification to Nip Hallucination in the Bud [69.79051730580014]
We demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs. We propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge. We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales.
arXiv Detail & Related papers (2024-01-19T15:39:49Z)
RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge [69.79676144482792]
This study aims to evaluate the ability of LLMs to distinguish reliable information from external knowledge. Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context containing counterfactual information.
arXiv Detail & Related papers (2023-11-14T13:24:19Z)
User-Controlled Knowledge Fusion in Large Language Models: Balancing Creativity and Hallucination [5.046007553593371]
Large Language Models (LLMs) generate diverse, relevant, and creative responses. Striking a balance between the LLM's imaginative capabilities and its adherence to factual information is a key challenge. This paper presents an innovative user-controllable mechanism that modulates the balance between an LLM's imaginative capabilities and its adherence to factual information.
arXiv Detail & Related papers (2023-07-30T06:06:35Z)
Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus. We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.