ProtoQA: A Question Answering Dataset for Prototypical Common-Sense
Reasoning
- URL: http://arxiv.org/abs/2005.00771v3
- Date: Tue, 27 Oct 2020 21:23:03 GMT
- Title: ProtoQA: A Question Answering Dataset for Prototypical Common-Sense
Reasoning
- Authors: Michael Boratko, Xiang Lorraine Li, Rajarshi Das, Tim O'Gorman, Dan
Le, Andrew McCallum
- Abstract summary: This paper introduces a new question answering dataset for training and evaluating common sense reasoning capabilities of artificial intelligence systems.
The training set is gathered from an existing set of questions played in a long-running international game show FAMILY- FEUD.
We also propose a generative evaluation task where a model has to output a ranked list of answers, ideally covering prototypical answers for a question.
- Score: 35.6375880208001
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given questions regarding some prototypical situation such as Name something
that people usually do before they leave the house for work? a human can easily
answer them via acquired experiences. There can be multiple right answers for
such questions, with some more common for a situation than others. This paper
introduces a new question answering dataset for training and evaluating common
sense reasoning capabilities of artificial intelligence systems in such
prototypical situations. The training set is gathered from an existing set of
questions played in a long-running international game show FAMILY- FEUD. The
hidden evaluation set is created by gathering answers for each question from
100 crowd-workers. We also propose a generative evaluation task where a model
has to output a ranked list of answers, ideally covering all prototypical
answers for a question. After presenting multiple competitive baseline models,
we find that human performance still exceeds model scores on all evaluation
metrics with a meaningful gap, supporting the challenging nature of the task.
Related papers
- Multimodal Reranking for Knowledge-Intensive Visual Question Answering [77.24401833951096]
We introduce a multi-modal reranker to improve the ranking quality of knowledge candidates for answer generation.
Experiments on OK-VQA and A-OKVQA show that multi-modal reranker from distant supervision provides consistent improvements.
arXiv Detail & Related papers (2024-07-17T02:58:52Z) - Aspect-oriented Consumer Health Answer Summarization [2.298110639419913]
Community Question-Answering (CQA) forums have revolutionized how people seek information, especially those related to their healthcare needs.
There can be several answers in response to a single query, which makes it hard to grasp the key information related to the specific health concern.
Our research focuses on aspect-based summarization of health answers to address this limitation.
arXiv Detail & Related papers (2024-05-10T07:52:43Z) - Can NLP Models 'Identify', 'Distinguish', and 'Justify' Questions that
Don't have a Definitive Answer? [43.03399918557937]
In real-world applications, users often ask questions that don't have a definitive answer.
We introduce QnotA, a dataset consisting of five different categories of questions that don't have definitive answers.
With this data, we formulate three evaluation tasks that test a system's ability to 'identify', 'distinguish', and 'justify' QnotA questions.
We show that even SOTA models including GPT-3 and Flan T5 do not fare well on these tasks and lack considerably behind the human performance baseline.
arXiv Detail & Related papers (2023-09-08T23:12:03Z) - Model Analysis & Evaluation for Ambiguous Question Answering [0.0]
Question Answering models are required to generate long-form answers that often combine conflicting pieces of information.
Recent advances in the field have shown strong capabilities in generating fluent responses, but certain research questions remain unanswered.
We aim to thoroughly investigate these aspects, and provide valuable insights into the limitations of the current approaches.
arXiv Detail & Related papers (2023-05-21T15:20:20Z) - Mixture of Experts for Biomedical Question Answering [34.92691831878302]
We propose a Mixture-of-Expert (MoE) based question answering method called MoEBQA.
MoEBQA decouples the computation for different types of questions by sparse routing.
We evaluate MoEBQA on three Biomedical Question Answering (BQA) datasets constructed based on real examinations.
arXiv Detail & Related papers (2022-04-15T14:11:40Z) - How Do We Answer Complex Questions: Discourse Structure of Long-form
Answers [51.973363804064704]
We study the functional structure of long-form answers collected from three datasets.
Our main goal is to understand how humans organize information to craft complex answers.
Our work can inspire future research on discourse-level modeling and evaluation of long-form QA systems.
arXiv Detail & Related papers (2022-03-21T15:14:10Z) - AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer
Summarization [73.91543616777064]
Community Question Answering (CQA) fora such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of community-based questions.
One goal of answer summarization is to produce a summary that reflects the range of answer perspectives.
This work introduces a novel dataset of 4,631 CQA threads for answer summarization, curated by professional linguists.
arXiv Detail & Related papers (2021-11-11T21:48:02Z) - MixQG: Neural Question Generation with Mixed Answer Types [54.23205265351248]
We propose a neural question generator, MixQG, to bridge this gap.
We combine 9 question answering datasets with diverse answer types, including yes/no, multiple-choice, extractive, and abstractive answers.
Our model outperforms existing work in both seen and unseen domains.
arXiv Detail & Related papers (2021-10-15T16:03:40Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - Review-guided Helpful Answer Identification in E-commerce [38.276241153439955]
Product-specific community question answering platforms can greatly help address the concerns of potential customers.
The user-provided answers on such platforms often vary a lot in their qualities.
Helpfulness votes from the community can indicate the overall quality of the answer, but they are often missing.
arXiv Detail & Related papers (2020-03-13T11:34:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.