How Do We Answer Complex Questions: Discourse Structure of Long-form
Answers
- URL: http://arxiv.org/abs/2203.11048v1
- Date: Mon, 21 Mar 2022 15:14:10 GMT
- Title: How Do We Answer Complex Questions: Discourse Structure of Long-form
Answers
- Authors: Fangyuan Xu, Junyi Jessy Li, Eunsol Choi
- Abstract summary: We study the functional structure of long-form answers collected from three datasets.
Our main goal is to understand how humans organize information to craft complex answers.
Our work can inspire future research on discourse-level modeling and evaluation of long-form QA systems.
- Score: 51.973363804064704
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Long-form answers, consisting of multiple sentences, can provide nuanced and
comprehensive answers to a broader set of questions. To better understand this
complex and understudied task, we study the functional structure of long-form
answers collected from three datasets, ELI5, WebGPT and Natural Questions. Our
main goal is to understand how humans organize information to craft complex
answers. We develop an ontology of six sentence-level functional roles for
long-form answers, and annotate 3.9k sentences in 640 answer paragraphs.
Different answer collection methods manifest in different discourse structures.
We further analyze model-generated answers -- finding that annotators agree
less with each other when annotating model-generated answers compared to
annotating human-written answers. Our annotated data enables training a strong
classifier that can be used for automatic analysis. We hope our work can
inspire future research on discourse-level modeling and evaluation of long-form
QA systems.
Related papers
- Question Answering in Natural Language: the Special Case of Temporal
Expressions [0.0]
Our work aims to leverage a popular approach used for general question answering, answer extraction, in order to find answers to temporal questions within a paragraph.
To train our model, we propose a new dataset, inspired by SQuAD, specifically tailored to provide rich temporal information.
Our evaluation shows that a deep learning model trained to perform pattern matching, often used in general question answering, can be adapted to temporal question answering.
arXiv Detail & Related papers (2023-11-23T16:26:24Z) - Concise Answers to Complex Questions: Summarization of Long-form Answers [27.190319030219285]
We conduct a user study on summarized answers generated from state-of-the-art models and our newly proposed extract-and-decontextualize approach.
We find a large proportion of long-form answers can be adequately summarized by at least one system, while complex and implicit answers are challenging to compress.
We observe that decontextualization improves the quality of the extractive summary, exemplifying its potential in the summarization task.
arXiv Detail & Related papers (2023-05-30T17:59:33Z) - Model Analysis & Evaluation for Ambiguous Question Answering [0.0]
Question Answering models are required to generate long-form answers that often combine conflicting pieces of information.
Recent advances in the field have shown strong capabilities in generating fluent responses, but certain research questions remain unanswered.
We aim to thoroughly investigate these aspects, and provide valuable insights into the limitations of the current approaches.
arXiv Detail & Related papers (2023-05-21T15:20:20Z) - Successive Prompting for Decomposing Complex Questions [50.00659445976735]
Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting.
We introduce Successive Prompting'', where we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution.
Our best model (with successive prompting) achieves an improvement of 5% absolute F1 on a few-shot version of the DROP dataset.
arXiv Detail & Related papers (2022-12-08T06:03:38Z) - WikiWhy: Answering and Explaining Cause-and-Effect Questions [62.60993594814305]
We introduce WikiWhy, a QA dataset built around explaining why an answer is true in natural language.
WikiWhy contains over 9,000 "why" question-answer-rationale triples, grounded on Wikipedia facts across a diverse set of topics.
GPT-3 baselines achieve only 38.7% human-evaluated correctness in the end-to-end answer & explain condition.
arXiv Detail & Related papers (2022-10-21T17:59:03Z) - AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer
Summarization [73.91543616777064]
Community Question Answering (CQA) fora such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of community-based questions.
One goal of answer summarization is to produce a summary that reflects the range of answer perspectives.
This work introduces a novel dataset of 4,631 CQA threads for answer summarization, curated by professional linguists.
arXiv Detail & Related papers (2021-11-11T21:48:02Z) - Discourse Comprehension: A Question Answering Framework to Represent
Sentence Connections [35.005593397252746]
A key challenge in building and evaluating models for discourse comprehension is the lack of annotated data.
This paper presents a novel paradigm that enables scalable data collection targeting the comprehension of news documents.
The resulting corpus, DCQA, consists of 22,430 question-answer pairs across 607 English documents.
arXiv Detail & Related papers (2021-11-01T04:50:26Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - GooAQ: Open Question Answering with Diverse Answer Types [63.06454855313667]
We present GooAQ, a large-scale dataset with a variety of answer types.
This dataset contains over 5 million questions and 3 million answers collected from Google.
arXiv Detail & Related papers (2021-04-18T05:40:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.