Related papers: After Retrieval, Before Generation: Enhancing the Trustworthiness of Large Language Models in RAG

After Retrieval, Before Generation: Enhancing the Trustworthiness of Large Language Models in RAG

URL: http://arxiv.org/abs/2505.17118v1
Date: Wed, 21 May 2025 16:29:19 GMT
Title: After Retrieval, Before Generation: Enhancing the Trustworthiness of Large Language Models in RAG
Authors: Xinbang Dai, Huikang Hu, Yuncheng Hua, Jiaqi Li, Yongrui Chen, Rihui Jin, Nan Hu, Guilin Qi,
Abstract summary: RAG systems face challenges in balancing internal (parametric) and external (retrieved) knowledge.<n>We propose the BRIDGE framework, which dynamically determines a comprehensive response strategy of large language models.<n>Experiments show BRIDGE outperforms baselines by 5-15% in accuracy while maintaining balanced performance across all scenarios.
Score: 13.603907803297561
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval-augmented generation (RAG) systems face critical challenges in balancing internal (parametric) and external (retrieved) knowledge, especially when these sources conflict or are unreliable. To analyze these scenarios comprehensively, we construct the Trustworthiness Response Dataset (TRD) with 36,266 questions spanning four RAG settings. We reveal that existing approaches address isolated scenarios-prioritizing one knowledge source, naively merging both, or refusing answers-but lack a unified framework to handle different real-world conditions simultaneously. Therefore, we propose the BRIDGE framework, which dynamically determines a comprehensive response strategy of large language models (LLMs). BRIDGE leverages an adaptive weighting mechanism named soft bias to guide knowledge collection, followed by a Maximum Soft-bias Decision Tree to evaluate knowledge and select optimal response strategies (trust internal/external knowledge, or refuse). Experiments show BRIDGE outperforms baselines by 5-15% in accuracy while maintaining balanced performance across all scenarios. Our work provides an effective solution for LLMs' trustworthy responses in real-world RAG applications.

Related papers

Rethinking LLM Parametric Knowledge as Post-retrieval Confidence for Dynamic Retrieval and Reranking [23.1400319714807]
Large Language Models (LLMs) often generate inaccurate responses (hallucinations) when faced with questions beyond their knowledge scope.<n>Retrieval-Augmented Generation (RAG) addresses this by leveraging external knowledge, but a critical challenge remains: determining whether retrieved contexts effectively enhance the models ability to answer specific queries.<n>This challenge underscores the importance of knowledge boundary awareness, which current methods-relying on discrete labels or limited signals-fail to address adequately.
arXiv Detail & Related papers (2025-09-08T09:37:20Z)
RAG in the Wild: On the (In)effectiveness of LLMs with Mixture-of-Knowledge Retrieval Augmentation [45.679455112940175]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by integrating external knowledge retrieved at inference time.<n>We evaluated RAG systems using MassiveDS, a large-scale datastore with mixture of knowledge, and identified critical limitations.
arXiv Detail & Related papers (2025-07-26T20:57:24Z)
CoRe-MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAG [53.950029990391066]
Cross-source knowledge textbfReconciliation for Multimodal RAG (CoRe-MMRAG)<n>We propose a novel end-to-end framework that effectively reconciles inconsistencies across knowledge sources.<n>Experiments on KB-VQA benchmarks show that CoRe-MMRAG achieves substantial improvements over baseline methods.
arXiv Detail & Related papers (2025-06-03T07:32:40Z)
RARE: Retrieval-Aware Robustness Evaluation for Retrieval-Augmented Generation Systems [35.47591417637136]
Retrieval-Augmented Generation (RAG) enhances recency and factuality in answers.<n>Existing evaluations rarely test how well these systems cope with real-world noise, conflicting between internal and external retrieved contexts, or fast-changing facts.<n>We introduce Retrieval-Aware Robustness Evaluation (RARE), a unified framework and large-scale benchmark that jointly stress-test query and document perturbations over dynamic, time-sensitive corpora.
arXiv Detail & Related papers (2025-06-01T02:42:36Z)
Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG [51.120170062795566]
We propose Divide-Then-Align (DTA) to endow RAG systems with the ability to respond with "I don't know" when the query is out of the knowledge boundary.<n>DTA balances accuracy with appropriate abstention, enhancing the reliability and trustworthiness of retrieval-augmented systems.
arXiv Detail & Related papers (2025-05-27T08:21:21Z)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards [67.86091419220816]
Large Language Models (LLMs) show great promise in complex reasoning.<n>A prevalent issue is superficial self-reflection'', where models fail to robustly verify their own outputs.<n>We introduce RISE (Reinforcing Reasoning with Self-Verification), a novel online RL framework designed to tackle this.
arXiv Detail & Related papers (2025-05-19T17:59:31Z)
On the Influence of Context Size and Model Choice in Retrieval-Augmented Generation Systems [5.69361786082969]
Retrieval-augmented generation (RAG) has emerged as an approach to augment large language models (LLMs)<n>We evaluate various context sizes, BM25 and semantic search as retrievers, and eight base LLMs.<n>Our findings indicate that final QA performance improves steadily with up to 15 snippets but stagnates or declines beyond that.
arXiv Detail & Related papers (2025-02-20T17:34:34Z)
CER: Confidence Enhanced Reasoning in LLMs [2.4392539322920763]
We introduce an uncertainty-aware framework designed to enhance the accuracy of Large Language Models responses.<n>We quantify the confidence of intermediate answers such as numerical results in mathematical reasoning and proper nouns in open-domain generation.<n>Results consistently validate the effectiveness of our novel confidence aggregation method.
arXiv Detail & Related papers (2025-02-20T15:16:42Z)
Unanswerability Evaluation for Retrieval Augmented Generation [74.3022365715597]
UAEval4RAG is a framework designed to evaluate whether RAG systems can handle unanswerable queries effectively.<n>We define a taxonomy with six unanswerable categories, and UAEval4RAG automatically synthesizes diverse and challenging queries.
arXiv Detail & Related papers (2024-12-16T19:11:55Z)
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework [77.45983464131977]
We focus on how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications.<n>Our research identifies two critical latent factors affecting RAG's confidence in its predictions.<n>We develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers.
arXiv Detail & Related papers (2024-09-24T14:52:14Z)
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey [59.26328612791924]
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs) We propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy.
arXiv Detail & Related papers (2024-09-16T09:06:44Z)
Pistis-RAG: Enhancing Retrieval-Augmented Generation with Human Feedback [41.88662700261036]
RAG systems face limitations when semantic relevance alone does not guarantee improved generation quality. We propose Pistis-RAG, a new RAG framework designed with a content-centric approach to better align LLMs with human preferences.
arXiv Detail & Related papers (2024-06-21T08:52:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.