Making Retrieval-Augmented Language Models Robust to Irrelevant Context
- URL: http://arxiv.org/abs/2310.01558v2
- Date: Sun, 5 May 2024 15:58:24 GMT
- Title: Making Retrieval-Augmented Language Models Robust to Irrelevant Context
- Authors: Ori Yoran, Tomer Wolfson, Ori Ram, Jonathan Berant,
- Abstract summary: An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant.
Recent work has shown that retrieval augmentation can sometimes have a negative effect on performance.
- Score: 55.564789967211844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-augmented language models (RALMs) hold promise to produce language understanding systems that are are factual, efficient, and up-to-date. An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant, and does not harm performance when it is not. This is particularly important in multi-hop reasoning scenarios, where misuse of irrelevant evidence can lead to cascading errors. However, recent work has shown that retrieval augmentation can sometimes have a negative effect on performance. In this work, we present a thorough analysis on five open-domain question answering benchmarks, characterizing cases when retrieval reduces accuracy. We then propose two methods to mitigate this issue. First, a simple baseline that filters out retrieved passages that do not entail question-answer pairs according to a natural language inference (NLI) model. This is effective in preventing performance reduction, but at a cost of also discarding relevant passages. Thus, we propose a method for automatically generating data to fine-tune the language model to properly leverage retrieved passages, using a mix of relevant and irrelevant contexts at training time. We empirically show that even 1,000 examples suffice to train the model to be robust to irrelevant contexts while maintaining high performance on examples with relevant ones.
Related papers
- Assessing "Implicit" Retrieval Robustness of Large Language Models [17.006566708461346]
We evaluate the "implicit" retrieval robustness of various large language models.
Fine-tuning on a mix of gold and distracting context significantly enhances the model's robustness to retrieval inaccuracies.
This suggests that large language models can implicitly handle relevant or irrelevant retrieved context by learning solely from the supervision of the final answer.
arXiv Detail & Related papers (2024-06-26T07:38:24Z) - Prompting-based Synthetic Data Generation for Few-Shot Question Answering [23.97949073816028]
We show that using large language models can improve Question Answering performance on various datasets in the few-shot setting.
We suggest that language models contain valuable task-agnostic knowledge that can be used beyond the common pre-training/fine-tuning scheme.
arXiv Detail & Related papers (2024-05-15T13:36:43Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Grounding Language Models for Visual Entity Recognition [27.783314622717448]
AutoVER is an autoregressive model for Visual Entity Recognition.
It mitigates low performance on out-of-domain entities while excelling in queries that require visually-situated reasoning.
It achieves significant improvements across different dataset splits in the recently proposed Oven-Wiki benchmark.
arXiv Detail & Related papers (2024-02-28T20:22:17Z) - Detrimental Contexts in Open-Domain Question Answering [9.059854023578508]
We analyze how passages can have a detrimental effect on retrieve-then-read architectures used in question answering.
Our findings demonstrate that model accuracy can be improved by 10% on two popular QA datasets by filtering out detrimental passages.
arXiv Detail & Related papers (2023-10-27T11:45:16Z) - RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning.
We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z) - Understanding and Mitigating Spurious Correlations in Text
Classification with Neighborhood Analysis [69.07674653828565]
Machine learning models have a tendency to leverage spurious correlations that exist in the training set but may not hold true in general circumstances.
In this paper, we examine the implications of spurious correlations through a novel perspective called neighborhood analysis.
We propose a family of regularization methods, NFL (doN't Forget your Language) to mitigate spurious correlations in text classification.
arXiv Detail & Related papers (2023-05-23T03:55:50Z) - Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer
with Fine-tuning Slow and Fast [50.19681990847589]
Existing research has shown that a multilingual pre-trained language model fine-tuned with one (source) language also performs well on downstream tasks for non-source languages.
This paper analyzes the fine-tuning process, discovers when the performance gap changes and identifies which network weights affect the overall performance most.
arXiv Detail & Related papers (2023-05-19T06:04:21Z) - Large Language Models Can Be Easily Distracted by Irrelevant Context [29.315230178997002]
We investigate how the model problem-solving accuracy can be influenced by irrelevant context.
We use benchmark to measure the distractibility of cutting-edge prompting techniques for large language models.
arXiv Detail & Related papers (2023-01-31T20:48:57Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.