QUDEVAL: The Evaluation of Questions Under Discussion Discourse Parsing
- URL: http://arxiv.org/abs/2310.14520v2
- Date: Wed, 1 Nov 2023 20:30:41 GMT
- Title: QUDEVAL: The Evaluation of Questions Under Discussion Discourse Parsing
- Authors: Yating Wu, Ritika Mangla, Greg Durrett, Junyi Jessy Li
- Abstract summary: Questions Under Discussion (QUD) is a versatile linguistic framework in which discourse progresses as continuously asking questions and answering them.
This work introduces the first framework for the automatic evaluation of QUD parsing.
We present QUDeval, a dataset of fine-grained evaluation of 2,190 QUD questions generated from both fine-tuned systems and LLMs.
- Score: 87.20804165014387
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Questions Under Discussion (QUD) is a versatile linguistic framework in which
discourse progresses as continuously asking questions and answering them.
Automatic parsing of a discourse to produce a QUD structure thus entails a
complex question generation task: given a document and an answer sentence,
generate a question that satisfies linguistic constraints of QUD and can be
grounded in an anchor sentence in prior context. These questions are known to
be curiosity-driven and open-ended. This work introduces the first framework
for the automatic evaluation of QUD parsing, instantiating the theoretical
constraints of QUD in a concrete protocol. We present QUDeval, a dataset of
fine-grained evaluation of 2,190 QUD questions generated from both fine-tuned
systems and LLMs. Using QUDeval, we show that satisfying all constraints of QUD
is still challenging for modern LLMs, and that existing evaluation metrics
poorly approximate parser quality. Encouragingly, human-authored QUDs are
scored highly by our human evaluators, suggesting that there is headroom for
further progress on language modeling to improve both QUD parsing and QUD
evaluation.
Related papers
- QUDSELECT: Selective Decoding for Questions Under Discussion Parsing [90.92351108691014]
Question Under Discussion (QUD) is a discourse framework that uses implicit questions to reveal discourse relationships between sentences.
We introduce QUDSELECT, a joint-training framework that selectively decodes the QUD dependency structures considering the QUD criteria.
Our method outperforms the state-of-the-art baseline models by 9% in human evaluation and 4% in automatic evaluation.
arXiv Detail & Related papers (2024-08-02T06:46:08Z) - SQUARE: Automatic Question Answering Evaluation using Multiple Positive
and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation)
We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z) - Discourse Analysis via Questions and Answers: Parsing Dependency
Structures of Questions Under Discussion [57.43781399856913]
This work adopts the linguistic framework of Questions Under Discussion (QUD) for discourse analysis.
We characterize relationships between sentences as free-form questions, in contrast to exhaustive fine-grained questions.
We develop the first-of-its-kind QUD that derives a dependency structure of questions over full documents.
arXiv Detail & Related papers (2022-10-12T03:53:12Z) - Discourse Comprehension: A Question Answering Framework to Represent
Sentence Connections [35.005593397252746]
A key challenge in building and evaluating models for discourse comprehension is the lack of annotated data.
This paper presents a novel paradigm that enables scalable data collection targeting the comprehension of news documents.
The resulting corpus, DCQA, consists of 22,430 question-answer pairs across 607 English documents.
arXiv Detail & Related papers (2021-11-01T04:50:26Z) - Learn to Resolve Conversational Dependency: A Consistency Training
Framework for Conversational Question Answering [14.382513103948897]
We propose ExCorD (Explicit guidance on how to resolve Conversational Dependency) to enhance the abilities of QA models in comprehending conversational context.
In our experiments, we demonstrate that ExCorD significantly improves the QA models' performance by up to 1.2 F1 on QuAC, and 5.2 F1 on CANARD.
arXiv Detail & Related papers (2021-06-22T07:16:45Z) - QAConv: Question Answering on Informative Conversations [85.2923607672282]
We focus on informative conversations including business emails, panel discussions, and work channels.
In total, we collect 34,204 QA pairs, including span-based, free-form, and unanswerable questions.
arXiv Detail & Related papers (2021-05-14T15:53:05Z) - Towards Data Distillation for End-to-end Spoken Conversational Question
Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA)
SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora.
Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z) - Question Rewriting for Conversational Question Answering [15.355557454305776]
We introduce a conversational QA architecture that sets the new state of the art on the TREC CAsT 2019 passage retrieval dataset.
We show that the same QR model improves QA performance on the QuAC dataset with respect to answer span extraction.
Our evaluation results indicate that the QR model achieves near human-level performance on both datasets.
arXiv Detail & Related papers (2020-04-30T09:27:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.