QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2512.19134v1
- Date: Mon, 22 Dec 2025 08:28:05 GMT
- Title: QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation
- Authors: Dehai Min, Kailin Zhang, Tongtong Wu, Lu Cheng,
- Abstract summary: Dynamic Retrieval-Augmented Generation adaptively determines when to retrieve during generation to hallucinations in large language models.<n>We propose QuCo-RAG, which shifts from subjective confidence to objective statistics computed from pre-training data.<n>Our method quantifies uncertainty through two stages: (1) before generation, we identify low-frequency entities indicating long-tail knowledge gaps; (2) during generation, we verify entity co-occurrence in the pre-training corpus, where zero co-occurrence often signals hallucination risk.
- Score: 14.312693191309101
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic Retrieval-Augmented Generation adaptively determines when to retrieve during generation to mitigate hallucinations in large language models (LLMs). However, existing methods rely on model-internal signals (e.g., logits, entropy), which are fundamentally unreliable because LLMs are typically ill-calibrated and often exhibit high confidence in erroneous outputs. We propose QuCo-RAG, which shifts from subjective confidence to objective statistics computed from pre-training data. Our method quantifies uncertainty through two stages: (1) before generation, we identify low-frequency entities indicating long-tail knowledge gaps; (2) during generation, we verify entity co-occurrence in the pre-training corpus, where zero co-occurrence often signals hallucination risk. Both stages leverage Infini-gram for millisecond-latency queries over 4 trillion tokens, triggering retrieval when uncertainty is high. Experiments on multi-hop QA benchmarks show QuCo-RAG achieves EM gains of 5--12 points over state-of-the-art baselines with OLMo-2 models, and transfers effectively to models with undisclosed pre-training data (Llama, Qwen, GPT), improving EM by up to 14 points. Domain generalization on biomedical QA further validates the robustness of our paradigm. These results establish corpus-grounded verification as a principled, practically model-agnostic paradigm for dynamic RAG. Our code is publicly available at https://github.com/ZhishanQ/QuCo-RAG.
Related papers
- Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning [32.32593439144886]
Behavior-calibrated reinforcement learning allows smaller models to surpass frontier models in uncertainty quantification.<n>Our model's log-scale Accuracy-to-Hallucination Ratio gain (0.806) exceeds GPT-5's (0.207) in a challenging in-domain evaluation.
arXiv Detail & Related papers (2025-12-22T22:51:48Z) - Open-World Deepfake Attribution via Confidence-Aware Asymmetric Learning [78.92934995292113]
We propose a Confidence-Aware Asymmetric Learning (CAL) framework, which balances confidence across known and novel forgery types.<n>CAL consistently outperforms previous methods, achieving new state-of-the-art performance on both known and novel forgery attribution.
arXiv Detail & Related papers (2025-12-14T12:31:28Z) - RI-Loss: A Learnable Residual-Informed Loss for Time Series Forecasting [13.117430904377905]
Time series forecasting relies on predicting future values from historical data.<n>MSE has two fundamental weaknesses: its point-wise error fails to capture temporal relationships, and it does not account for inherent noise in the data.<n>We introduce the Residual-Informed Loss (RI-Loss), a novel objective function based on the Hilbert-Schmidt Independence Criterion (HSIC)
arXiv Detail & Related papers (2025-11-13T09:36:00Z) - Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG [35.96258615258145]
We introduce Entropy-Trend Constraint (ETC), a training-free method that determines optimal retrieval timing by modeling the dynamics of token-level uncertainty.<n>ETC consistently outperforms strong baselines while reducing retrieval frequency.<n>It is plug-and-play, model-agnostic, and readily integrable into existing decoding pipelines.
arXiv Detail & Related papers (2025-11-13T05:28:02Z) - Detecting Token-Level Hallucinations Using Variance Signals: A Reference-Free Approach [0.0]
Large Language Models (LLMs) have demonstrated impressive generative capabilities across diverse tasks but remain susceptible to hallucinations.<n>We introduce a reference-free, token-level hallucination detection framework that leverages the variance in token log-probabilities across multiple generations.<n>Our approach is model-agnostic, interpretable, and suited for real-time or post-hoc analysis.
arXiv Detail & Related papers (2025-07-05T19:20:59Z) - Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation [27.811765400370838]
We propose LoVeC (Long-form Verbalized Confidence), an on-the-fly verbalized confidence estimation method for long-form generation.<n>Specifically, we use reinforcement learning (RL) to train LLMs to append numerical confidence scores to each generated statement.<n>Our experiments show that our RL-trained models achieve better calibration and generalize robustly across domains.
arXiv Detail & Related papers (2025-05-29T18:05:20Z) - Spatial Reasoning with Denoising Models [49.83744014336816]
We introduce a framework to perform reasoning over sets of continuous variables via denoising generative models.<n>For the first time, that order of generation can successfully be predicted by the denoising network itself.<n>Using these findings, we can increase the accuracy of specific reasoning tasks from 1% to >50%.
arXiv Detail & Related papers (2025-02-28T14:08:30Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - Score-based Generative Modeling in Latent Space [93.8985523558869]
Score-based generative models (SGMs) have recently demonstrated impressive results in terms of both sample quality and distribution coverage.
Here, we propose the Latent Score-based Generative Model (LSGM), a novel approach that trains SGMs in a latent space.
Moving from data to latent space allows us to train more expressive generative models, apply SGMs to non-continuous data, and learn smoother SGMs in a smaller space.
arXiv Detail & Related papers (2021-06-10T17:26:35Z) - Continual Learning with Fully Probabilistic Models [70.3497683558609]
We present an approach for continual learning based on fully probabilistic (or generative) models of machine learning.
We propose a pseudo-rehearsal approach using a Gaussian Mixture Model (GMM) instance for both generator and classifier functionalities.
We show that GMR achieves state-of-the-art performance on common class-incremental learning problems at very competitive time and memory complexity.
arXiv Detail & Related papers (2021-04-19T12:26:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.