Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic
Dataset for Narrative Comprehension
- URL: http://arxiv.org/abs/2203.13947v1
- Date: Sat, 26 Mar 2022 00:20:05 GMT
- Title: Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic
Dataset for Narrative Comprehension
- Authors: Ying Xu, Dakuo Wang, Mo Yu, Daniel Ritchie, Bingsheng Yao, Tongshuang
Wu, Zheng Zhang, Toby Jia-Jun Li, Nora Bradford, Branda Sun, Tran Bao Hoang,
Yisi Sang, Yufang Hou, Xiaojuan Ma, Diyi Yang, Nanyun Peng, Zhou Yu, Mark
Warschauer
- Abstract summary: We introduce FairytaleQA, a dataset focusing on narrative comprehension of kindergarten to eighth-grade students.
FairytaleQA consists of 10,580 explicit and implicit questions derived from 278 children-friendly stories.
- Score: 136.82507046638784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Question answering (QA) is a fundamental means to facilitate assessment and
training of narrative comprehension skills for both machines and young
children, yet there is scarcity of high-quality QA datasets carefully designed
to serve this purpose. In particular, existing datasets rarely distinguish
fine-grained reading skills, such as the understanding of varying narrative
elements. Drawing on the reading education research, we introduce FairytaleQA,
a dataset focusing on narrative comprehension of kindergarten to eighth-grade
students. Generated by educational experts based on an evidence-based
theoretical framework, FairytaleQA consists of 10,580 explicit and implicit
questions derived from 278 children-friendly stories, covering seven types of
narrative elements or relations. Our dataset is valuable in two folds: First,
we ran existing QA models on our dataset and confirmed that this annotation
helps assess models' fine-grained learning skills. Second, the dataset supports
question generation (QG) task in the education domain. Through benchmarking
with QG models, we show that the QG model trained on FairytaleQA is capable of
asking high-quality and more diverse questions.
Related papers
- FairytaleQA Translated: Enabling Educational Question and Answer Generation in Less-Resourced Languages [0.0]
This paper introduces machine-translated versions of FairytaleQA, a renowned QA dataset designed to assess and enhance narrative comprehension skills in young children.
We employ fine-tuned, modest-scale models to establish benchmarks for both Question Generation (QG) and QA tasks within the translated datasets.
We present a case study proposing a model for generating question-answer pairs, with an evaluation incorporating quality metrics such as question well-formedness, answerability, relevance, and children suitability.
arXiv Detail & Related papers (2024-06-06T16:31:47Z) - StorySparkQA: Expert-Annotated QA Pairs with Real-World Knowledge for Children's Story-Based Learning [36.16783204588302]
We design an annotation framework, empowered by existing knowledge graph to capture experts' annotations and thinking process.
StorySparkQA dataset comprises 5,868 expert-annotated QA pairs with real-world knowledge.
arXiv Detail & Related papers (2023-11-16T10:30:26Z) - Diversity Enhanced Narrative Question Generation for Storybooks [4.043005183192124]
We introduce a multi-question generation model (mQG) capable of generating multiple, diverse, and answerable questions.
To validate the answerability of the generated questions, we employ a SQuAD2.0 fine-tuned question answering model.
mQG shows promising results across various evaluation metrics, among strong baselines.
arXiv Detail & Related papers (2023-10-25T08:10:04Z) - Toward Unsupervised Realistic Visual Question Answering [70.67698100148414]
We study the problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs)
We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training.
We propose a new testing dataset, RGQA, which combines AQs from an existing VQA dataset with around 29K human-annotated UQs.
This combines pseudo UQs obtained by randomly pairing images and questions, with an
arXiv Detail & Related papers (2023-03-09T06:58:29Z) - Question Generation for Reading Comprehension Assessment by Modeling How
and What to Ask [3.470121495099]
We study Question Generation (QG) for reading comprehension where inferential questions are critical.
We propose a two-step model (HTA-WTA) that takes advantage of previous datasets.
We show that the HTA-WTA model tests for strong SCRS by asking deep inferential questions.
arXiv Detail & Related papers (2022-04-06T15:52:24Z) - Educational Question Generation of Children Storybooks via Question Type Distribution Learning and Event-Centric Summarization [67.1483219601714]
We propose a novel question generation method that first learns the question type distribution of an input story paragraph.
We finetune a pre-trained transformer-based sequence-to-sequence model using silver samples composed by educational question-answer pairs.
Our work indicates the necessity of decomposing question type distribution learning and event-centric summary generation for educational question generation.
arXiv Detail & Related papers (2022-03-27T02:21:19Z) - It is AI's Turn to Ask Human a Question: Question and Answer Pair
Generation for Children Storybooks in FairytaleQA Dataset [30.557699346777582]
In educational applications, teachers and parents sometimes may not know what questions they should ask a child that can maximize their language learning results.
With a newly released book QA dataset (FairytaleQA), we developed an automated QA generation model architecture for this novel application.
arXiv Detail & Related papers (2021-09-08T04:11:54Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.