Pragmatic Evaluation of Clarifying Questions with Fact-Level Masking
- URL: http://arxiv.org/abs/2310.11571v2
- Date: Sun, 7 Jan 2024 21:01:55 GMT
- Title: Pragmatic Evaluation of Clarifying Questions with Fact-Level Masking
- Authors: Matthew Toles, Yukun Huang, Zhou Yu, Luis Gravano
- Abstract summary: We present a definition and framework for natural language pragmatic asking of clarifying questions (PACQ)
We also present fact-level masking (FLM), a procedure for converting natural language datasets into self-supervised PACQ datasets.
Our experiments show that current zero-shot models struggle to ask questions that retrieve useful information, as compared to human annotators.
- Score: 21.480602733510256
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The ability to derive useful information by asking clarifying questions (ACQ)
is an important element of real life collaboration on reasoning tasks, such as
question answering (QA). Existing natural language ACQ challenges, however,
evaluate generations based on word overlap rather than the value of the
information itself. Word overlap is often an inappropriate metric for question
generation since many different questions could be useful in a given situation,
and a single question can be phrased many different ways. Instead, we propose
evaluating questions pragmatically based on the value of the information they
retrieve. Here we present a definition and framework for natural language
pragmatic asking of clarifying questions (PACQ), the problem of generating
questions that result in answers useful for a reasoning task. We also present
fact-level masking (FLM), a procedure for converting natural language datasets
into self-supervised PACQ datasets by omitting particular critical facts.
Finally, we generate a PACQ dataset from the HotpotQA dataset using FLM and
evaluate several zero-shot language models on it. Our experiments show that
current zero-shot models struggle to ask questions that retrieve useful
information, as compared to human annotators. These results demonstrate an
opportunity to use FLM datasets and the PACQ framework to objectively evaluate
and improve question generation and other language models.
Related papers
- Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - Diversity Enhanced Narrative Question Generation for Storybooks [4.043005183192124]
We introduce a multi-question generation model (mQG) capable of generating multiple, diverse, and answerable questions.
To validate the answerability of the generated questions, we employ a SQuAD2.0 fine-tuned question answering model.
mQG shows promising results across various evaluation metrics, among strong baselines.
arXiv Detail & Related papers (2023-10-25T08:10:04Z) - Weakly Supervised Visual Question Answer Generation [2.7605547688813172]
We present a weakly supervised method that synthetically generates question-answer pairs procedurally from visual information and captions.
We perform an exhaustive experimental analysis on VQA dataset and see that our model significantly outperforms SOTA methods on BLEU scores.
arXiv Detail & Related papers (2023-06-11T08:46:42Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - Modern Question Answering Datasets and Benchmarks: A Survey [5.026863544662493]
Question Answering (QA) is one of the most important natural language processing (NLP) tasks.
It aims using NLP technologies to generate a corresponding answer to a given question based on the massive unstructured corpus.
In this paper, we investigate influential QA datasets that have been released in the era of deep learning.
arXiv Detail & Related papers (2022-06-30T05:53:56Z) - Would You Ask it that Way? Measuring and Improving Question Naturalness
for Knowledge Graph Question Answering [20.779777536841493]
Knowledge graph question answering (KGQA) facilitates information access by leveraging structured data without requiring formal query language expertise from the user.
We create the IQN-KGQA test collection by sampling questions from existing KGQA datasets and evaluating them with regards to five different aspects of naturalness.
We find that some KGQA systems fare worse when presented with more realistic formulations of NL questions.
arXiv Detail & Related papers (2022-05-25T13:32:27Z) - Understanding Unnatural Questions Improves Reasoning over Text [54.235828149899625]
Complex question answering (CQA) over raw text is a challenging task.
Learning an effective CQA model requires large amounts of human-annotated data.
We address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions.
arXiv Detail & Related papers (2020-10-19T10:22:16Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z) - ClarQ: A large-scale and diverse dataset for Clarification Question
Generation [67.1162903046619]
We devise a novel bootstrapping framework that assists in the creation of a diverse, large-scale dataset of clarification questions based on postcomments extracted from stackexchange.
We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering.
We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
arXiv Detail & Related papers (2020-06-10T17:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.