Related papers: (QA)$^2$: Question Answering with Questionable Assumptions

(QA)$^2$: Question Answering with Questionable Assumptions

URL: http://arxiv.org/abs/2212.10003v2
Date: Tue, 29 Aug 2023 19:36:32 GMT
Title: (QA)$^2$: Question Answering with Questionable Assumptions
Authors: Najoung Kim, Phu Mon Htut, Samuel R. Bowman, Jackson Petty
Abstract summary: Naturally occurring information-seeking questions often contain questionable assumptions. We propose (QA)$2$ (Question Answering with Questionable Assumptions) as an evaluation dataset.
Score: 40.27041019985178
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Naturally occurring information-seeking questions often contain questionable assumptions -- assumptions that are false or unverifiable. Questions containing questionable assumptions are challenging because they require a distinct answer strategy that deviates from typical answers for information-seeking questions. For instance, the question "When did Marie Curie discover Uranium?" cannot be answered as a typical "when" question without addressing the false assumption "Marie Curie discovered Uranium". In this work, we propose (QA)$^2$ (Question Answering with Questionable Assumptions), an open-domain evaluation dataset consisting of naturally occurring search engine queries that may or may not contain questionable assumptions. To be successful on (QA)$^2$, systems must be able to detect questionable assumptions and also be able to produce adequate responses for both typical information-seeking questions and ones with questionable assumptions. Through human rater acceptability on end-to-end QA with (QA)$^2$, we find that current models do struggle with handling questionable assumptions, leaving substantial headroom for progress.

Related papers

Identifying and Answering Questions with False Assumptions: An Interpretable Approach [15.283206722883149]
We focus on identifying and answering questions with false assumptions in several domains.<n>We first investigate whether the problem reduces to fact verification.<n>Then, we present an approach leveraging external evidence to mitigate hallucinations.
arXiv Detail & Related papers (2025-08-21T00:24:32Z)
Which questions should I answer? Salience Prediction of Inquisitive Questions [118.097974193544]
We show that highly salient questions are empirically more likely to be answered in the same article. We further validate our findings by showing that answering salient questions is an indicator of summarization quality in news.
arXiv Detail & Related papers (2024-04-16T21:33:05Z)
Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents [22.023543164141504]
We present Researchy Questions, a dataset of search engine queries tediously filtered to be non-factoid, decompositional'' and multi-perspective. We show that users spend a lot of effort'' on these questions in terms of signals like clicks and session length. We also show that slow thinking'' answering techniques, like decomposition into sub-questions shows benefit over answering directly.
arXiv Detail & Related papers (2024-02-27T21:27:16Z)
Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations [70.6395572287422]
Self-alignment method is capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions. We conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired.
arXiv Detail & Related papers (2024-02-23T02:24:36Z)
When to Read Documents or QA History: On Unified and Selective Open-domain QA [22.941325275188376]
This paper studies the problem of open-domain question answering, with the aim of answering a diverse range of questions leveraging knowledge resources. Two types of sources, QA-pair and document corpora, have been actively leveraged with the following complementary strength. A natural follow-up is thus leveraging both models, while a naive pipelining or integration approaches have failed to bring additional gains over either model alone.
arXiv Detail & Related papers (2023-06-07T06:03:39Z)
Mastering the ABCDs of Complex Questions: Answer-Based Claim Decomposition for Fine-grained Self-Evaluation [9.776667356119352]
We propose answer-based claim decomposition (ABCD), a prompting strategy that decomposes questions into true/false claims. Using the decomposed ABCD claims, we perform fine-grained self-evaluation. We find that GPT-3.5 has some ability to determine to what extent its answer satisfies the criteria of the input question.
arXiv Detail & Related papers (2023-05-24T05:53:11Z)
Selectively Answering Ambiguous Questions [38.83930394700588]
We find that the most reliable approach to decide when to abstain involves quantifying repetition within sampled model outputs. Our results suggest that sampling-based confidence scores help calibrate answers to relatively unambiguous questions.
arXiv Detail & Related papers (2023-05-24T01:25:38Z)
CREPE: Open-Domain Question Answering with False Presuppositions [92.20501870319765]
We introduce CREPE, a QA dataset containing a natural distribution of presupposition failures from online information-seeking forums. We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections. We show that adaptations of existing open-domain QA models can find presuppositions moderately well, but struggle when predicting whether a presupposition is factually correct.
arXiv Detail & Related papers (2022-11-30T18:54:49Z)
Match$^2$: A Matching over Matching Model for Similar Question Identification [74.7142127303489]
Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers. Similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked. It has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions. Traditional methods typically take a one-side usage, which leverages the answer as some expanded representation of the
arXiv Detail & Related papers (2020-06-21T05:59:34Z)
Unsupervised Question Decomposition for Question Answering [102.56966847404287]
We propose an algorithm for One-to-N Unsupervised Sequence Sequence (ONUS) that learns to map one hard, multi-hop question to many simpler, single-hop sub-questions. We show large QA improvements on HotpotQA over a strong baseline on the original, out-of-domain, and multi-hop dev sets.
arXiv Detail & Related papers (2020-02-22T19:40:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.