Related papers: Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise

Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise

URL: http://arxiv.org/abs/2305.01579v3
Date: Sun, 9 Jun 2024 23:42:48 GMT
Title: Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise
Authors: Giwon Hong, Jeonghwan Kim, Junmo Kang, Sung-Hyon Myaeng, Joyce Jiyoung Whang,
Abstract summary: In a retrieved document set, even the "relevant" documents may contain misleading or incorrect information. Our work investigates a more challenging scenario in which even the "relevant" documents may contain misleading or incorrect information. We propose approaches for handling knowledge conflicts among retrieved documents by explicitly fine-tuning a discriminator or prompting GPT-3.5 to elicit its discriminative capability.
Score: 14.38859858538404
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Most existing retrieval-augmented language models (LMs) assume a naive dichotomy within a retrieved document set: query-relevance and irrelevance. Our work investigates a more challenging scenario in which even the "relevant" documents may contain misleading or incorrect information, causing conflict among the retrieved documents and thereby negatively influencing model decisions as noise. We observe that existing LMs are highly brittle to the presence of conflicting information in both the fine-tuning and in-context few-shot learning scenarios. We propose approaches for handling knowledge conflicts among retrieved documents by explicitly fine-tuning a discriminator or prompting GPT-3.5 to elicit its discriminative capability. Our empirical results on open-domain QA show that these approaches significantly enhance model robustness. We also provide our findings on incorporating the fine-tuned discriminator's decision into the in-context learning process, proposing a way to exploit the benefits of two disparate learning schemes. Alongside our findings, we provide MacNoise, a machine-generated, conflict-induced dataset to further encourage research in this direction.

Related papers

DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs [36.47787866482107]
Retrieval Augmented Generation (RAG) is a commonly used approach for enhancing large language models.<n>We propose a novel taxonomy of knowledge conflict types in RAG, along with the desired model behavior for each type.<n>We then introduce CONFLICTS, a high-quality benchmark with expert annotations of conflict types in a realistic RAG setting.
arXiv Detail & Related papers (2025-06-10T06:52:57Z)
Resolving Conflicting Evidence in Automated Fact-Checking: A Study on Retrieval-Augmented LLMs [12.923119372847834]
This paper presents the first systematic evaluation of Retrieval-Augmented Generation (RAG) models for fact-checking.<n>Experiments reveal critical vulnerabilities in state-of-the-art RAG methods, particularly in resolving conflicts stemming from differences in media source credibility.<n>Our results show that effectively incorporating source credibility significantly enhances the ability of RAG models to resolve conflicting evidence and improve fact-checking performance.
arXiv Detail & Related papers (2025-05-23T11:35:03Z)
Retrieval-Augmented Generation with Conflicting Evidence [57.66282463340297]
Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses. In practice, these systems often need to handle ambiguous user queries and potentially conflicting information from multiple sources. We propose RAMDocs (Retrieval with Ambiguity and Misinformation in Documents), a new dataset that simulates complex and realistic scenarios for conflicting evidence for a user query.
arXiv Detail & Related papers (2025-04-17T16:46:11Z)
Disentangling Memory and Reasoning Ability in Large Language Models [97.26827060106581]
We propose a new inference paradigm that decomposes the complex inference process into two distinct and clear actions. Our experiment results show that this decomposition improves model performance and enhances the interpretability of the inference process.
arXiv Detail & Related papers (2024-11-20T17:55:38Z)
Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that the pointwise mutual information between a context and a question is an effective gauge for language model performance. We propose two methods that use the pointwise mutual information between a document and a question as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
ReasoningRank: Teaching Student Models to Rank through Reasoning-Based Knowledge Distillation [11.756344944226495]
We propose Reason-to-Rank (R2R), a novel open-source reranking approach that enhances transparency. R2R generates two types of reasoning: direct relevance reasoning, which explains how a document addresses the query, and comparison reasoning, which justifies the relevance of one document over another. Our student models are trained to generate meaningful reasoning and rerank documents, achieving competitive performance across multiple datasets.
arXiv Detail & Related papers (2024-10-07T16:25:39Z)
A Counterfactual Explanation Framework for Retrieval Models [4.562474301450839]
We use an optimization framework to solve the question of which words played a role in not being favored by a retrieval model for a particular query. Our experiments show the effectiveness of our proposed approach in predicting counterfactuals for both statistical (e.g. BM25) and deep-learning-based models.
arXiv Detail & Related papers (2024-09-01T22:33:29Z)
C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z)
Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack. We exploit text similarity and the model's resistance to document modifications as potential MI signals. We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z)
How to Enhance Causal Discrimination of Utterances: A Case on Affective Reasoning [22.11437627661179]
We propose the incorporation of textiti.i.i.d. noise terms into the conversation process, thereby constructing a structural causal model (SCM) To facilitate the implementation of deep learning, we introduce the cogn frameworks to handle unstructured conversation data, and employ an autoencoder architecture to regard the unobservable noise as learnable "implicit causes"
arXiv Detail & Related papers (2023-05-04T07:45:49Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence [37.18100697469402]
We simulate knowledge conflicts where parametric knowledge suggests one answer and different passages suggest different answers. We find retrieval performance heavily impacts which sources models rely on, and current models mostly rely on non-performing knowledge. We present a new calibration study, where models are discouraged from presenting any single answer when presented with multiple conflicting answer candidates.
arXiv Detail & Related papers (2022-10-25T01:46:00Z)
SAIS: Supervising and Augmenting Intermediate Steps for Document-Level Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction. Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z)
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models. We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.