Related papers: QUDSELECT: Selective Decoding for Questions Under Discussion Parsing

QUDSELECT: Selective Decoding for Questions Under Discussion Parsing

URL: http://arxiv.org/abs/2408.01046v1
Date: Fri, 2 Aug 2024 06:46:08 GMT
Title: QUDSELECT: Selective Decoding for Questions Under Discussion Parsing
Authors: Ashima Suvarna, Xiao Liu, Tanmay Parekh, Kai-Wei Chang, Nanyun Peng,
Abstract summary: Question Under Discussion (QUD) is a discourse framework that uses implicit questions to reveal discourse relationships between sentences. We introduce QUDSELECT, a joint-training framework that selectively decodes the QUD dependency structures considering the QUD criteria. Our method outperforms the state-of-the-art baseline models by 9% in human evaluation and 4% in automatic evaluation.
Score: 90.92351108691014
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Question Under Discussion (QUD) is a discourse framework that uses implicit questions to reveal discourse relationships between sentences. In QUD parsing, each sentence is viewed as an answer to a question triggered by an anchor sentence in prior context. The resulting QUD structure is required to conform to several theoretical criteria like answer compatibility (how well the question is answered), making QUD parsing a challenging task. Previous works construct QUD parsers in a pipelined manner (i.e. detect the trigger sentence in context and then generate the question). However, these parsers lack a holistic view of the task and can hardly satisfy all the criteria. In this work, we introduce QUDSELECT, a joint-training framework that selectively decodes the QUD dependency structures considering the QUD criteria. Using instruction-tuning, we train models to simultaneously predict the anchor sentence and generate the associated question. To explicitly incorporate the criteria, we adopt a selective decoding strategy of sampling multiple QUD candidates during inference, followed by selecting the best one with criteria scorers. Our method outperforms the state-of-the-art baseline models by 9% in human evaluation and 4% in automatic evaluation, demonstrating the effectiveness of our framework.

Related papers

MinosEval: Distinguishing Factoid and Non-Factoid for Tailored Open-Ended QA Evaluation with LLMs [15.278241998033822]
Open-ended question answering (QA) is a key task for evaluating the capabilities of large language models (LLMs)<n>We propose textbfMinosEval, a novel evaluation method that first distinguishes open-ended questions and then ranks candidate answers.
arXiv Detail & Related papers (2025-06-18T07:49:13Z)
Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations [85.81295563405433]
Language model users often issue queries that lack specification, where the context under which a query was issued is not explicit. We present contextualized evaluations, a protocol that synthetically constructs context surrounding an under-specified query and provides it during evaluation. We find that the presence of context can 1) alter conclusions drawn from evaluation, even flipping win rates between model pairs, 2) nudge evaluators to make fewer judgments based on surface-level criteria, like style, and 3) provide new insights about model behavior across diverse contexts.
arXiv Detail & Related papers (2024-11-11T18:58:38Z)
QUDEVAL: The Evaluation of Questions Under Discussion Discourse Parsing [87.20804165014387]
Questions Under Discussion (QUD) is a versatile linguistic framework in which discourse progresses as continuously asking questions and answering them. This work introduces the first framework for the automatic evaluation of QUD parsing. We present QUDeval, a dataset of fine-grained evaluation of 2,190 QUD questions generated from both fine-tuned systems and LLMs.
arXiv Detail & Related papers (2023-10-23T03:03:58Z)
SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation) We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z)
Discourse Analysis via Questions and Answers: Parsing Dependency Structures of Questions Under Discussion [57.43781399856913]
This work adopts the linguistic framework of Questions Under Discussion (QUD) for discourse analysis. We characterize relationships between sentences as free-form questions, in contrast to exhaustive fine-grained questions. We develop the first-of-its-kind QUD that derives a dependency structure of questions over full documents.
arXiv Detail & Related papers (2022-10-12T03:53:12Z)
QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance [54.48031346496593]
We propose $textbfQRelScore$, a context-aware evaluation metric for $underlinetextbfRel$evance evaluation metric. Based on off-the-shelf language models such as BERT and GPT2, QRelScore employs both word-level hierarchical matching and sentence-level prompt-based generation. Compared with existing metrics, our experiments demonstrate that QRelScore is able to achieve a higher correlation with human judgments while being much more robust to adversarial samples.
arXiv Detail & Related papers (2022-04-29T07:39:53Z)
A Simple Approach to Jointly Rank Passages and Select Relevant Sentences in the OBQA Context [15.556928370682094]
How to select the relevant information from a large corpus is a crucial problem for reasoning and inference. Many existing frameworks use a deep learning model to select relevant passages and then answer each question by matching a sentence in the corresponding passage. We present a simple yet effective framework to address these problems by jointly ranking passages and selecting sentences.
arXiv Detail & Related papers (2021-09-22T03:11:17Z)
ASQ: Automatically Generating Question-Answer Pairs using AMRs [1.0878040851638]
We introduce ASQ, a tool to automatically mine questions and answers from a sentence, using its Abstract Meaning Representation (AMR) A qualitative evaluation of the output generated by ASQ from the AMR 2.0 data shows that the question-answer pairs are natural and valid. We intend to make this tool and the results publicly available for others to use and build upon.
arXiv Detail & Related papers (2021-05-20T20:38:05Z)
A Wrong Answer or a Wrong Question? An Intricate Relationship between Question Reformulation and Answer Selection in Conversational Question Answering [15.355557454305776]
We show that question rewriting (QR) of the conversational context allows to shed more light on this phenomenon. We present the results of this analysis on the TREC CAsT and QuAC (CANARD) datasets.
arXiv Detail & Related papers (2020-10-13T06:29:51Z)
Match$^2$: A Matching over Matching Model for Similar Question Identification [74.7142127303489]
Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers. Similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked. It has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions. Traditional methods typically take a one-side usage, which leverages the answer as some expanded representation of the
arXiv Detail & Related papers (2020-06-21T05:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.