Improving Reading Comprehension Question Generation with Data
Augmentation and Overgenerate-and-rank
- URL: http://arxiv.org/abs/2306.08847v1
- Date: Thu, 15 Jun 2023 04:23:25 GMT
- Title: Improving Reading Comprehension Question Generation with Data
Augmentation and Overgenerate-and-rank
- Authors: Nischal Ashok Kumar, Nigel Fernandez, Zichao Wang, Andrew Lan
- Abstract summary: Automated answer-aware reading comprehension question generation has significant potential to scale up learner support in educational activities.
One key technical challenge in this setting is that there can be multiple questions, sometimes very different from each other, with the same answer.
We propose 1) a data augmentation method that enriches the training dataset with diverse questions given the same context and answer and 2) an overgenerate-and-rank method to select the best question from a pool of candidates.
- Score: 3.854023945160742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reading comprehension is a crucial skill in many aspects of education,
including language learning, cognitive development, and fostering early
literacy skills in children. Automated answer-aware reading comprehension
question generation has significant potential to scale up learner support in
educational activities. One key technical challenge in this setting is that
there can be multiple questions, sometimes very different from each other, with
the same answer; a trained question generation method may not necessarily know
which question human educators would prefer. To address this challenge, we
propose 1) a data augmentation method that enriches the training dataset with
diverse questions given the same context and answer and 2) an
overgenerate-and-rank method to select the best question from a pool of
candidates. We evaluate our method on the FairytaleQA dataset, showing a 5%
absolute improvement in ROUGE-L over the best existing method. We also
demonstrate the effectiveness of our method in generating harder, "implicit"
questions, where the answers are not contained in the context as text spans.
Related papers
- How to Engage Your Readers? Generating Guiding Questions to Promote Active Reading [60.19226384241482]
We introduce GuidingQ, a dataset of 10K in-text questions from textbooks and scientific articles.
We explore various approaches to generate such questions using language models.
We conduct a human study to understand the implication of such questions on reading comprehension.
arXiv Detail & Related papers (2024-07-19T13:42:56Z) - Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations [70.6395572287422]
Self-alignment method is capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions.
We conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired.
arXiv Detail & Related papers (2024-02-23T02:24:36Z) - ChatPRCS: A Personalized Support System for English Reading
Comprehension based on ChatGPT [3.847982502219679]
This paper presents a novel personalized support system for reading comprehension, referred to as ChatPRCS.
ChatPRCS employs methods including reading comprehension proficiency prediction, question generation, and automatic evaluation.
arXiv Detail & Related papers (2023-09-22T11:46:44Z) - SkillQG: Learning to Generate Question for Reading Comprehension
Assessment [54.48031346496593]
We present a question generation framework with controllable comprehension types for assessing and improving machine reading comprehension models.
We first frame the comprehension type of questions based on a hierarchical skill-based schema, then formulate $textttSkillQG$ as a skill-conditioned question generator.
Empirical results demonstrate that $textttSkillQG$ outperforms baselines in terms of quality, relevance, and skill-controllability.
arXiv Detail & Related papers (2023-05-08T14:40:48Z) - Question Generation for Reading Comprehension Assessment by Modeling How
and What to Ask [3.470121495099]
We study Question Generation (QG) for reading comprehension where inferential questions are critical.
We propose a two-step model (HTA-WTA) that takes advantage of previous datasets.
We show that the HTA-WTA model tests for strong SCRS by asking deep inferential questions.
arXiv Detail & Related papers (2022-04-06T15:52:24Z) - Educational Question Generation of Children Storybooks via Question Type Distribution Learning and Event-Centric Summarization [67.1483219601714]
We propose a novel question generation method that first learns the question type distribution of an input story paragraph.
We finetune a pre-trained transformer-based sequence-to-sequence model using silver samples composed by educational question-answer pairs.
Our work indicates the necessity of decomposing question type distribution learning and event-centric summary generation for educational question generation.
arXiv Detail & Related papers (2022-03-27T02:21:19Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z) - Knowledgeable Dialogue Reading Comprehension on Key Turns [84.1784903043884]
Multi-choice machine reading comprehension (MRC) requires models to choose the correct answer from candidate options given a passage and a question.
Our research focuses dialogue-based MRC, where the passages are multi-turn dialogues.
It suffers from two challenges, the answer selection decision is made without support of latently helpful commonsense, and the multi-turn context may hide considerable irrelevant information.
arXiv Detail & Related papers (2020-04-29T07:04:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.