Related papers: Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models

Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models

URL: http://arxiv.org/abs/2311.09210v1
Date: Wed, 15 Nov 2023 18:54:53 GMT
Title: Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
Authors: Wenhao Yu, Hongming Zhang, Xiaoman Pan, Kaixin Ma, Hongwei Wang, Dong Yu
Abstract summary: Chain-of-Noting (CoN) is a novel approach aimed at improving the robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios. CoN achieves an average improvement of +7.9 in EM score given entirely noisy retrieved documents and +10.5 in rejection rates for real-time questions that fall outside the pre-training knowledge scope.
Score: 57.80297731039275
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-augmented language models (RALMs) represent a substantial advancement in the capabilities of large language models, notably in reducing factual hallucination by leveraging external knowledge sources. However, the reliability of the retrieved information is not always guaranteed. The retrieval of irrelevant data can lead to misguided responses, and potentially causing the model to overlook its inherent knowledge, even when it possesses adequate information to address the query. Moreover, standard RALMs often struggle to assess whether they possess adequate knowledge, both intrinsic and retrieved, to provide an accurate answer. In situations where knowledge is lacking, these systems should ideally respond with "unknown" when the answer is unattainable. In response to these challenges, we introduces Chain-of-Noting (CoN), a novel approach aimed at improving the robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios. The core idea of CoN is to generate sequential reading notes for retrieved documents, enabling a thorough evaluation of their relevance to the given question and integrating this information to formulate the final answer. We employed ChatGPT to create training data for CoN, which was subsequently trained on an LLaMa-2 7B model. Our experiments across four open-domain QA benchmarks show that RALMs equipped with CoN significantly outperform standard RALMs. Notably, CoN achieves an average improvement of +7.9 in EM score given entirely noisy retrieved documents and +10.5 in rejection rates for real-time questions that fall outside the pre-training knowledge scope.

Related papers

Teaching Language Models To Gather Information Proactively [53.85419549904644]
Large language models (LLMs) are increasingly expected to function as collaborative partners.<n>In this work, we introduce a new task paradigm: proactive information gathering.<n>We design a scalable framework that generates partially specified, real-world tasks, masking key information.<n>Within this setup, our core innovation is a reinforcement finetuning strategy that rewards questions that elicit genuinely new, implicit user information.
arXiv Detail & Related papers (2025-07-28T23:50:09Z)
Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation [108.13261761812517]
We introduce FRANQ (Faithfulness-based Retrieval Augmented UNcertainty Quantification), a novel method for hallucination detection in RAG outputs.<n>We present a new long-form Question Answering (QA) dataset annotated for both factuality and faithfulness.
arXiv Detail & Related papers (2025-05-27T11:56:59Z)
Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation [72.70046559930555]
We propose a generic RAG approach called Adaptive Note-Enhanced RAG (Adaptive-Note) for complex QA tasks. Specifically, Adaptive-Note introduces an overarching view of knowledge growth, iteratively gathering new information in the form of notes. In addition, we employ an adaptive, note-based stop-exploration strategy to decide "what to retrieve and when to stop" to encourage sufficient knowledge exploration.
arXiv Detail & Related papers (2024-10-11T14:03:29Z)
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows" [74.7488607599921]
FaithEval is a benchmark to evaluate the faithfulness of large language models (LLMs) in contextual scenarios. FaithEval comprises 4.9K high-quality problems in total, validated through a rigorous four-stage context construction and validation framework.
arXiv Detail & Related papers (2024-09-30T06:27:53Z)
W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering [28.79851078451609]
Large Language Models (LLMs) often struggle to generate factual answers relying solely on their internal (parametric) knowledge. To address this limitation, Retrieval-Augmented Generation (RAG) systems enhance LLMs by retrieving relevant information from external sources. We propose W-RAG by utilizing the ranking capabilities of LLMs to create weakly labeled data for training dense retrievers.
arXiv Detail & Related papers (2024-08-15T22:34:44Z)
Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning [5.053086684547045]
This study introduces an in-context learning-based approach to enhance the reasoning capabilities of RALMs. Our approach increases accuracy in identifying unanswerable and conflicting scenarios without requiring additional fine-tuning.
arXiv Detail & Related papers (2024-08-08T12:42:43Z)
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing [0.2302001830524133]
This survey paper addresses the absence of a comprehensive overview on Retrieval-Augmented Language Models (RALMs) The paper discusses the essential components of RALMs, including Retrievers, Language Models, and Augmentations. RALMs demonstrate utility in a spectrum of tasks, from translation and dialogue systems to knowledge-intensive applications.
arXiv Detail & Related papers (2024-04-30T13:14:51Z)
REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering [115.72130322143275]
REAR is a RElevance-Aware Retrieval-augmented approach for open-domain question answering (QA) We develop a novel architecture for LLM-based RAG systems, by incorporating a specially designed assessment module. Experiments on four open-domain QA tasks show that REAR significantly outperforms previous a number of competitive RAG approaches.
arXiv Detail & Related papers (2024-02-27T13:22:51Z)
"Knowing When You Don't Know": A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation [90.09260023184932]
Retrieval-Augmented Generation (RAG) grounds Large Language Model (LLM) output by leveraging external knowledge sources to reduce factual hallucinations. NoMIRACL is a human-annotated dataset for evaluating LLM robustness in RAG across 18 typologically diverse languages. We measure relevance assessment using: (i) hallucination rate, measuring model tendency to hallucinate, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset.
arXiv Detail & Related papers (2023-12-18T17:18:04Z)
Empirical evaluation of Uncertainty Quantification in Retrieval-Augmented Language Models for Science [0.0]
This study investigates how uncertainty scores vary when scientific knowledge is incorporated as pretraining and retrieval data. We observe that an existing RALM finetuned with scientific knowledge as the retrieval data tends to be more confident in generating predictions. We also found that RALMs are overconfident in their predictions, making inaccurate predictions more confidently than accurate ones.
arXiv Detail & Related papers (2023-11-15T20:42:11Z)
ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases. We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets. Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z)
Self-Knowledge Guided Retrieval Augmentation for Large Language Models [59.771098292611846]
Large language models (LLMs) have shown superior performance without task-specific fine-tuning. Retrieval-based methods can offer non-parametric world knowledge and improve the performance on tasks such as question answering. Self-Knowledge guided Retrieval augmentation (SKR) is a simple yet effective method which can let LLMs refer to the questions they have previously encountered.
arXiv Detail & Related papers (2023-10-08T04:22:33Z)
Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering [17.672572064705445]
Large language models (LLMs) equipped with Chain-of-Thought (CoT) have shown impressive reasoning ability in various downstream tasks. We propose a framework called Knowledge-Driven Chain-of-Thought (KD-CoT) to verify and modify reasoning traces in CoT via interaction with external knowledge.
arXiv Detail & Related papers (2023-08-25T09:23:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.