Related papers: ConfRAG: Confidence-Guided Retrieval-Augmenting Generation

Related papers

GRACE: Reinforcement Learning for Grounded Response and Abstention under Contextual Evidence [9.80421132842862]
Retrieval-Augmented Generation (RAG) integrates external knowledge to enhance Large Language Models (LLMs)<n>RAG is susceptible to two critical flaws: providing correct answers without explicit grounded evidence and producing fabricated responses when the retrieved context is insufficient.<n>We propose GRACE, a reinforcement-learning framework that simultaneously mitigates both types of flaws.
arXiv Detail & Related papers (2026-01-08T02:47:33Z)
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation [14.312693191309101]
Dynamic Retrieval-Augmented Generation adaptively determines when to retrieve during generation to hallucinations in large language models.<n>We propose QuCo-RAG, which shifts from subjective confidence to objective statistics computed from pre-training data.<n>Our method quantifies uncertainty through two stages: (1) before generation, we identify low-frequency entities indicating long-tail knowledge gaps; (2) during generation, we verify entity co-occurrence in the pre-training corpus, where zero co-occurrence often signals hallucination risk.
arXiv Detail & Related papers (2025-12-22T08:28:05Z)
Honesty over Accuracy: Trustworthy Language Models through Reinforced Hesitation [12.503662455234954]
We show that modern language models produce confident hallucinations even when wrong answers carry catastrophic consequences.<n>We propose Reinforced Hesitation (RH): a modification to Reinforcement Learning from Verifiable Rewards (RLVR) to use ternary rewards instead of binary.
arXiv Detail & Related papers (2025-11-14T17:20:45Z)
Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations [103.16279860448874]
We propose an online reinforcement learning method using a novel binary retrieval-augmented reward (RAR)<n>For open-ended generation, binary RAR achieves a 39.3% reduction in hallucination rates.<n>In short-form question answering, the model learns abstention, strategically outputting "I don't know" when faced with insufficient parametric knowledge.
arXiv Detail & Related papers (2025-10-20T16:45:43Z)
From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs [58.02809208460186]
We revisit this paradox using high-quality traces from DeepSeek-R1 as demonstrations.<n>We find that adding more exemplars consistently degrades accuracy, even when demonstrations are optimal.<n>We introduce Insight-to-solve (I2S), a sequential test-time procedure that turns demonstrations into explicit, reusable insights.
arXiv Detail & Related papers (2025-09-27T08:59:31Z)
When Two LLMs Debate, Both Think They'll Win [0.0]
We evaluate Large Language Models (LLMs) in a dynamic, adversarial debate setting.<n>We organized 60 three-round policy debates among ten state-of-the-art LLMs.<n>We observed five concerning patterns.
arXiv Detail & Related papers (2025-05-25T15:06:17Z)
Inside-Out: Hidden Factual Knowledge in LLMs [50.79758420289131]
This work presents a framework for assessing whether large language models (LLMs) encode more factual knowledge in their parameters than what they express in their outputs.<n>We first propose a formal definition of knowledge, quantifying it for a given question as the fraction of correct-incorrect answer pairs where the correct one is ranked higher.<n>We then present a case study, applying this framework to three popular open-weights LLMs in a closed-book QA setup.
arXiv Detail & Related papers (2025-03-19T15:21:48Z)
Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation [18.098228823748617]
We present Interrogation Attack (IA), a membership inference technique targeting documents in the RAG datastore.<n>We demonstrate successful inference with just 30 queries while remaining stealthy.<n>We observe a 2x improvement in TPR@1%FPR over prior inference attacks across diverse RAG configurations.
arXiv Detail & Related papers (2025-02-01T04:01:18Z)
Learning to Route LLMs with Confidence Tokens [43.63392143501436]
We study the extent to which large language models can reliably indicate confidence in their answers.<n>We propose Self-REF, a lightweight training strategy to teach LLMs to express confidence in a reliable manner.<n>Compared to conventional approaches such as verbalizing confidence and examining token probabilities, we demonstrate empirically that confidence tokens show significant improvements in downstream routing and rejection learning tasks.
arXiv Detail & Related papers (2024-10-17T07:28:18Z)
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework [77.45983464131977]
We focus on how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications.<n>Our research identifies two critical latent factors affecting RAG's confidence in its predictions.<n>We develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers.
arXiv Detail & Related papers (2024-09-24T14:52:14Z)
Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence? [26.69630281310365]
Large language models (LLMs) have been found to produce hallucinations when the question exceeds their internal knowledge boundaries. Existing research on LLMs' perception of their knowledge boundaries typically uses either the probability of the generated tokens or the verbalized confidence as the model's confidence in its response.
arXiv Detail & Related papers (2024-08-19T08:01:11Z)
LLM Internal States Reveal Hallucination Risk Faced With a Query [62.29558761326031]
Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries. This paper investigates whether Large Language Models can estimate their own hallucination risk before response generation. By a probing estimator, we leverage LLM self-assessment, achieving an average hallucination estimation accuracy of 84.32% at run time.
arXiv Detail & Related papers (2024-07-03T17:08:52Z)
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models [69.68379406317682]
We introduce a listener-aware finetuning method (LACIE) to calibrate implicit and explicit confidence markers. We show that LACIE models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener. We find that training with LACIE results in 47% fewer incorrect answers being accepted while maintaining the same level of acceptance for correct answers.
arXiv Detail & Related papers (2024-05-31T17:16:38Z)
Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation [9.730412606588335]
We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state. We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs.
arXiv Detail & Related papers (2024-01-27T16:19:30Z)
Knowledge Verification to Nip Hallucination in the Bud [69.79051730580014]
We demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs. We propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge. We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales.
arXiv Detail & Related papers (2024-01-19T15:39:49Z)
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs [60.61002524947733]
Previous confidence elicitation methods rely on white-box access to internal model information or model fine-tuning. This leads to a growing need to explore the untapped area of black-box approaches for uncertainty estimation. We define a systematic framework with three components: prompting strategies for eliciting verbalized confidence, sampling methods for generating multiple responses, and aggregation techniques for computing consistency.
arXiv Detail & Related papers (2023-06-22T17:31:44Z)
Adversarial Unlearning: Reducing Confidence Along Adversarial Directions [88.46039795134993]
We propose a complementary regularization strategy that reduces confidence on self-generated examples. The method, which we call RCAD, aims to reduce confidence on out-of-distribution examples lying along directions adversarially chosen to increase training loss. Despite its simplicity, we find on many classification benchmarks that RCAD can be added to existing techniques to increase test accuracy by 1-3% in absolute value.
arXiv Detail & Related papers (2022-06-03T02:26:24Z)
CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models [12.654742638172307]
CORAL is a novel loss function based on a reinforcement learning view of the dialog generation task. It estimates human preference for generated responses while considering both the context and the response. To overcome challenges such as high sample complexity of RL training and a large action space, we propose a mix-policy training algorithm.
arXiv Detail & Related papers (2022-05-21T10:36:22Z)
PRover: Proof Generation for Interpretable Reasoning over Rules [81.40404921232192]
We propose a transformer-based model that answers binary questions over rule-bases and generates the corresponding proofs. Our model learns to predict nodes and edges corresponding to proof graphs in an efficient constrained training paradigm. We conduct experiments on synthetic, hand-authored, and human-paraphrased rule-bases to show promising results for QA and proof generation.
arXiv Detail & Related papers (2020-10-06T15:47:53Z)
How Much Can We Really Trust You? Towards Simple, Interpretable Trust Quantification Metrics for Deep Neural Networks [94.65749466106664]
We conduct a thought experiment and explore two key questions about trust in relation to confidence. We introduce a suite of metrics for assessing the overall trustworthiness of deep neural networks based on their behaviour when answering a set of questions. The proposed metrics are by no means perfect, but the hope is to push the conversation towards better metrics.
arXiv Detail & Related papers (2020-09-12T17:37:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.