Related papers: CREPE: Open-Domain Question Answering with False Presuppositions

CREPE: Open-Domain Question Answering with False Presuppositions

URL: http://arxiv.org/abs/2211.17257v1
Date: Wed, 30 Nov 2022 18:54:49 GMT
Title: CREPE: Open-Domain Question Answering with False Presuppositions
Authors: Xinyan Velocity Yu, Sewon Min, Luke Zettlemoyer and Hannaneh Hajishirzi
Abstract summary: We introduce CREPE, a QA dataset containing a natural distribution of presupposition failures from online information-seeking forums. We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections. We show that adaptations of existing open-domain QA models can find presuppositions moderately well, but struggle when predicting whether a presupposition is factually correct.
Score: 92.20501870319765
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Information seeking users often pose questions with false presuppositions, especially when asking about unfamiliar topics. Most existing question answering (QA) datasets, in contrast, assume all questions have well defined answers. We introduce CREPE, a QA dataset containing a natural distribution of presupposition failures from online information-seeking forums. We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections. Through extensive baseline experiments, we show that adaptations of existing open-domain QA models can find presuppositions moderately well, but struggle when predicting whether a presupposition is factually correct. This is in large part due to difficulty in retrieving relevant evidence passages from a large text corpus. CREPE provides a benchmark to study question answering in the wild, and our analyses provide avenues for future work in better modeling and further studying the task.

Related papers

Which questions should I answer? Salience Prediction of Inquisitive Questions [118.097974193544]
We show that highly salient questions are empirically more likely to be answered in the same article. We further validate our findings by showing that answering salient questions is an indicator of summarization quality in news.
arXiv Detail & Related papers (2024-04-16T21:33:05Z)
Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity) Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z)
Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia. Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z)
Discourse Comprehension: A Question Answering Framework to Represent Sentence Connections [35.005593397252746]
A key challenge in building and evaluating models for discourse comprehension is the lack of annotated data. This paper presents a novel paradigm that enables scalable data collection targeting the comprehension of news documents. The resulting corpus, DCQA, consists of 22,430 question-answer pairs across 607 English documents.
arXiv Detail & Related papers (2021-11-01T04:50:26Z)
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers. Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text. We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z)
Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction [46.38201136570501]
We present a model that aggregates and combines evidence from multiple passages to adaptively predict a single answer or a set of question-answer pairs for ambiguous questions. Our model, named Refuel, achieves a new state-of-the-art performance on the AmbigQA dataset, and shows competitive performance on NQ-Open and TriviaQA.
arXiv Detail & Related papers (2020-11-26T05:48:55Z)
Challenges in Information-Seeking QA: Unanswerable Questions and Paragraph Retrieval [46.3246135936476]
We analyze why answering information-seeking queries is more challenging and where their prevalent unanswerabilities arise. Our controlled experiments suggest two headrooms -- paragraph selection and answerability prediction. We manually annotate 800 unanswerable examples across six languages on what makes them challenging to answer.
arXiv Detail & Related papers (2020-10-22T17:48:17Z)
What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets [40.64071905569975]
Question answering biases in video QA datasets can mislead multimodal model to overfit to QA artifacts. Our study shows biases can come from annotators and type of questions. We also show empirically that using annotator-non-overlapping train-test splits can reduce QA biases for video QA datasets.
arXiv Detail & Related papers (2020-07-07T17:00:11Z)
Do not let the history haunt you -- Mitigating Compounding Errors in Conversational Question Answering [17.36904526340775]
We find that compounding errors occur when using previously predicted answers at test time. We propose a sampling strategy that dynamically selects between target answers and model predictions during training.
arXiv Detail & Related papers (2020-05-12T13:29:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.