Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback
- URL: http://arxiv.org/abs/2403.18349v3
- Date: Thu, 8 Aug 2024 08:57:23 GMT
- Title: Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback
- Authors: Hongshen Xu, Zichen Zhu, Situo Zhang, Da Ma, Shuai Fan, Lu Chen, Kai Yu,
- Abstract summary: Large Language Models (LLMs) often generate erroneous outputs, known as hallucinations.
We present a novel alignment framework called Reinforcement Learning from Knowledge Feedback (RLKF)
- Score: 14.120154004011084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) often generate erroneous outputs, known as hallucinations, due to their limitations in discerning questions beyond their knowledge scope. While addressing hallucination has been a focal point in research, previous efforts primarily concentrate on enhancing correctness without giving due consideration to the significance of rejection mechanisms. In this paper, we conduct a comprehensive examination of the role of rejection, introducing the notion of model reliability along with corresponding metrics. These metrics measure the model's ability to provide accurate responses while adeptly rejecting questions exceeding its knowledge boundaries, thereby minimizing hallucinations. To improve the inherent reliability of LLMs, we present a novel alignment framework called Reinforcement Learning from Knowledge Feedback (RLKF). RLKF leverages knowledge feedback to dynamically determine the model's knowledge boundary and trains a reliable reward model to encourage the refusal of out-of-knowledge questions. Experimental results on mathematical questions affirm the substantial efficacy of RLKF in significantly enhancing LLM reliability.
Related papers
- Hindsight Distillation Reasoning with Knowledge Encouragement Preference for Knowledge-based Visual Question Answering [55.368681418311894]
Existing Knowledge-based Visual Question Answering (KBVQA) methods either utilize implicit knowledge in multimodal large language models (MLLMs) via in-context learning or explicit knowledge via retrieval augmented generation.<n>We provide a Hindsight Distilled Reasoning (HinD) framework with Knowledge Encouragement Preference Optimization (KEPO)<n> Experiments on OK-VQA and A-OKVQA validate the effectiveness of HinD, showing that HinD with elicited reasoning from 7B-size MLLM achieves superior performance without commercial model APIs or outside knowledge.
arXiv Detail & Related papers (2025-11-14T10:03:23Z) - Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality [27.687276551678583]
Hallucination and factuality deficits remain key obstacles to the reliability of large language models.<n>We propose a novel framework that focuses on the knowledge consistency between the policy model's expressed knowledge and the base model's parametric knowledge.
arXiv Detail & Related papers (2025-09-28T09:23:06Z) - Rethinking LLM Parametric Knowledge as Post-retrieval Confidence for Dynamic Retrieval and Reranking [23.1400319714807]
Large Language Models (LLMs) often generate inaccurate responses (hallucinations) when faced with questions beyond their knowledge scope.<n>Retrieval-Augmented Generation (RAG) addresses this by leveraging external knowledge, but a critical challenge remains: determining whether retrieved contexts effectively enhance the models ability to answer specific queries.<n>This challenge underscores the importance of knowledge boundary awareness, which current methods-relying on discrete labels or limited signals-fail to address adequately.
arXiv Detail & Related papers (2025-09-08T09:37:20Z) - Accommodate Knowledge Conflicts in Retrieval-augmented LLMs: Towards Reliable Response Generation in the Wild [11.058848731627233]
Large language models (LLMs) have advanced information retrieval systems.
LLMs often face knowledge conflicts between internal memory and retrievaled external information.
We propose Swin-VIB, a novel framework that integrates a pipeline of variational information bottleneck models into adaptive augmentation of retrieved information.
arXiv Detail & Related papers (2025-04-17T14:40:31Z) - Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling [48.15636223774418]
Large language models (LLMs) frequently hallucinate due to misaligned self-awareness.
Existing approaches mitigate hallucinations via uncertainty estimation or query rejection.
We propose the Explicit Knowledge Boundary Modeling framework to integrate fast and slow reasoning systems.
arXiv Detail & Related papers (2025-03-04T03:16:02Z) - Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies [66.30619782227173]
Large language models (LLMs) can produce erroneous responses that sound fluent and convincing.
We identify several features of LLM responses that shape users' reliance.
We find that explanations increase reliance on both correct and incorrect responses.
We observe less reliance on incorrect responses when sources are provided or when explanations exhibit inconsistencies.
arXiv Detail & Related papers (2025-02-12T16:35:41Z) - UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models [41.67393607081513]
Large Language Models (LLMs) often struggle to accurately express the factual knowledge they possess.
We propose the UAlign framework, which leverages Uncertainty estimations to represent knowledge boundaries.
We show that the proposed UAlign can significantly enhance the LLMs' capacities to confidently answer known questions.
arXiv Detail & Related papers (2024-12-16T14:14:27Z) - Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal [21.342265570934995]
Existing methods have largely overlooked the importance of refusal responses as a means of enhancing MLLMs reliability.
We present the Information Boundary-aware Learning Framework (InBoL), a novel approach that empowers MLLMs to refuse to answer user queries when encountering insufficient information.
This framework introduces a comprehensive data generation pipeline and tailored training strategies to improve the model's ability to deliver appropriate refusal responses.
arXiv Detail & Related papers (2024-12-15T14:17:14Z) - A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions [9.045698110081686]
Large language models (LLMs) generate plausible, factually-incorrect responses, which are expressed with striking confidence.
Previous work has shown that hallucinations and other non-factual responses generated by LLMs can be detected by examining the uncertainty of the LLM in its response to the pertinent prompt.
This survey seeks to provide an extensive review of existing uncertainty quantification methods for LLMs, identifying their salient features, along with their strengths and weaknesses.
arXiv Detail & Related papers (2024-12-07T06:56:01Z) - Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment [56.87031484108484]
Large Language Models (LLMs) are increasingly recognized for their practical applications.
Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs.
By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs.
arXiv Detail & Related papers (2024-11-09T15:12:28Z) - Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons [13.266817091775042]
We investigate whether Large Language Models (LLMs) actively recall or retrieve their internal repositories of factual knowledge when faced with reasoning tasks.
We reveal that LLMs fail to harness the critical factual associations under certain circumstances.
We assess the effect of Chain-of-Thought (CoT) prompting, a powerful technique for addressing complex reasoning tasks.
arXiv Detail & Related papers (2024-08-06T15:07:08Z) - Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established.
This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt.
We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z) - How Reliable are LLMs as Knowledge Bases? Re-thinking Facutality and Consistency [60.25969380388974]
Large Language Models (LLMs) are increasingly explored as knowledge bases (KBs)<n>Current evaluation methods focus too narrowly on knowledge retention, overlooking other crucial criteria for reliable performance.<n>We propose new criteria and metrics to quantify factuality and consistency, leading to a final reliability score.
arXiv Detail & Related papers (2024-07-18T15:20:18Z) - Rowen: Adaptive Retrieval-Augmented Generation for Hallucination Mitigation in LLMs [88.75700174889538]
Hallucinations present a significant challenge for large language models (LLMs)<n>The utilization of parametric knowledge in generating factual content is constrained by the limited knowledge of LLMs.<n>We present Rowen, a novel framework that enhances LLMs with an adaptive retrieval augmentation process tailored to address hallucinated outputs.
arXiv Detail & Related papers (2024-02-16T11:55:40Z) - Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for
Hallucination Mitigation [9.730412606588335]
We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state.
We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs.
arXiv Detail & Related papers (2024-01-27T16:19:30Z) - Knowledge Verification to Nip Hallucination in the Bud [69.79051730580014]
We demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs.
We propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge.
We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales.
arXiv Detail & Related papers (2024-01-19T15:39:49Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - Mitigating Large Language Model Hallucinations via Autonomous Knowledge
Graph-based Retrofitting [51.7049140329611]
This paper proposes Knowledge Graph-based Retrofitting (KGR) to mitigate factual hallucination during the reasoning process.
Experiments show that KGR can significantly improve the performance of LLMs on factual QA benchmarks.
arXiv Detail & Related papers (2023-11-22T11:08:38Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.