Discourse Comprehension: A Question Answering Framework to Represent
Sentence Connections
- URL: http://arxiv.org/abs/2111.00701v1
- Date: Mon, 1 Nov 2021 04:50:26 GMT
- Title: Discourse Comprehension: A Question Answering Framework to Represent
Sentence Connections
- Authors: Wei-Jen Ko, Cutter Dalton, Mark Simmons, Eliza Fisher, Greg Durrett,
Junyi Jessy Li
- Abstract summary: A key challenge in building and evaluating models for discourse comprehension is the lack of annotated data.
This paper presents a novel paradigm that enables scalable data collection targeting the comprehension of news documents.
The resulting corpus, DCQA, consists of 22,430 question-answer pairs across 607 English documents.
- Score: 35.005593397252746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While there has been substantial progress in text comprehension through
simple factoid question answering, more holistic comprehension of a discourse
still presents a major challenge. Someone critically reflecting on a text as
they read it will pose curiosity-driven, often open-ended questions, which
reflect deep understanding of the content and require complex reasoning to
answer. A key challenge in building and evaluating models for this type of
discourse comprehension is the lack of annotated data, especially since finding
answers to such questions (which may not be answered at all) requires high
cognitive load for annotators over long documents. This paper presents a novel
paradigm that enables scalable data collection targeting the comprehension of
news documents, viewing these questions through the lens of discourse. The
resulting corpus, DCQA (Discourse Comprehension by Question Answering),
consists of 22,430 question-answer pairs across 607 English documents. DCQA
captures both discourse and semantic links between sentences in the form of
free-form, open-ended questions. On an evaluation set that we annotated on
questions from the INQUISITIVE dataset, we show that DCQA provides valuable
supervision for answering open-ended questions. We additionally design
pre-training methods utilizing existing question-answering resources, and use
synthetic data to accommodate unanswerable questions.
Related papers
- Open Domain Question Answering with Conflicting Contexts [55.739842087655774]
We find that as much as 25% of unambiguous, open domain questions can lead to conflicting contexts when retrieved using Google Search.
We ask our annotators to provide explanations for their selections of correct answers.
arXiv Detail & Related papers (2024-10-16T07:24:28Z) - Auto FAQ Generation [0.0]
We propose a system for generating FAQ documents that extract the salient questions and their corresponding answers from sizeable text documents.
We use existing text summarization, sentence ranking via the Text rank algorithm, and question-generation tools to create an initial set of questions and answers.
arXiv Detail & Related papers (2024-05-13T03:30:27Z) - Answering Ambiguous Questions with a Database of Questions, Answers, and
Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia.
Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z) - Keeping the Questions Conversational: Using Structured Representations
to Resolve Dependency in Conversational Question Answering [26.997542897342164]
We propose a novel framework, CONVSR (CONVQA using Structured Representations) for capturing and generating intermediate representations as conversational cues.
We test our model on the QuAC and CANARD datasets and illustrate by experimental results that our proposed framework achieves a better F1 score than the standard question rewriting model.
arXiv Detail & Related papers (2023-04-14T13:42:32Z) - CREPE: Open-Domain Question Answering with False Presuppositions [92.20501870319765]
We introduce CREPE, a QA dataset containing a natural distribution of presupposition failures from online information-seeking forums.
We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections.
We show that adaptations of existing open-domain QA models can find presuppositions moderately well, but struggle when predicting whether a presupposition is factually correct.
arXiv Detail & Related papers (2022-11-30T18:54:49Z) - Discourse Analysis via Questions and Answers: Parsing Dependency
Structures of Questions Under Discussion [57.43781399856913]
This work adopts the linguistic framework of Questions Under Discussion (QUD) for discourse analysis.
We characterize relationships between sentences as free-form questions, in contrast to exhaustive fine-grained questions.
We develop the first-of-its-kind QUD that derives a dependency structure of questions over full documents.
arXiv Detail & Related papers (2022-10-12T03:53:12Z) - Conversational QA Dataset Generation with Answer Revision [2.5838973036257458]
We introduce a novel framework that extracts question-worthy phrases from a passage and then generates corresponding questions considering previous conversations.
Our framework revises the extracted answers after generating questions so that answers exactly match paired questions.
arXiv Detail & Related papers (2022-09-23T04:05:38Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - Challenges in Information-Seeking QA: Unanswerable Questions and
Paragraph Retrieval [46.3246135936476]
We analyze why answering information-seeking queries is more challenging and where their prevalent unanswerabilities arise.
Our controlled experiments suggest two headrooms -- paragraph selection and answerability prediction.
We manually annotate 800 unanswerable examples across six languages on what makes them challenging to answer.
arXiv Detail & Related papers (2020-10-22T17:48:17Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.