EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain
- URL: http://arxiv.org/abs/2210.06104v1
- Date: Wed, 12 Oct 2022 11:28:34 GMT
- Title: EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain
- Authors: Amir Hadifar, Semere Kiros Bitew, Johannes Deleu, Chris Develder,
Thomas Demeester
- Abstract summary: This dataset contains 3,397 samples of multiple choice questions, answers (including distractors), and their source documents from the educational domain.
Each question is phrased in two forms, normal and close. Correct answers are linked to source documents with sentence-level annotations.
All questions have been generated by educational experts rather than crowd workers to ensure they are maintaining educational and learning standards.
- Score: 20.801638768447948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a high-quality dataset that contains 3,397 samples comprising
(i) multiple choice questions, (ii) answers (including distractors), and (iii)
their source documents, from the educational domain. Each question is phrased
in two forms, normal and close. Correct answers are linked to source documents
with sentence-level annotations. Thus, our versatile dataset can be used for
both question and distractor generation, as well as to explore new challenges
such as question format conversion. Furthermore, 903 questions are accompanied
by their cognitive complexity level as per Bloom's taxonomy. All questions have
been generated by educational experts rather than crowd workers to ensure they
are maintaining educational and learning standards. Our analysis and
experiments suggest distinguishable differences between our dataset and
commonly used ones for question generation for educational purposes. We believe
this new dataset can serve as a valuable resource for research and evaluation
in the educational domain. The dataset and baselines will be released to
support further research in question generation.
Related papers
- Qsnail: A Questionnaire Dataset for Sequential Question Generation [76.616068047362]
We present the first dataset specifically constructed for the questionnaire generation task, which comprises 13,168 human-written questionnaires.
We conduct experiments on Qsnail, and the results reveal that retrieval models and traditional generative models do not fully align with the given research topic and intents.
Despite enhancements through the chain-of-thought prompt and finetuning, questionnaires generated by language models still fall short of human-written questionnaires.
arXiv Detail & Related papers (2024-02-22T04:14:10Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Alloprof: a new French question-answer education dataset and its use in
an information retrieval case study [0.13750624267664155]
We introduce a new public French question-answering dataset from Alloprof, a Quebec-based help website.
This dataset contains 29 349 questions and their explanations in a variety of school subjects from 10 368 students.
To predict relevant documents, architectures using pre-trained BERT models were fine-tuned and evaluated.
arXiv Detail & Related papers (2023-02-10T20:23:27Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - Modern Question Answering Datasets and Benchmarks: A Survey [5.026863544662493]
Question Answering (QA) is one of the most important natural language processing (NLP) tasks.
It aims using NLP technologies to generate a corresponding answer to a given question based on the massive unstructured corpus.
In this paper, we investigate influential QA datasets that have been released in the era of deep learning.
arXiv Detail & Related papers (2022-06-30T05:53:56Z) - Controllable Open-ended Question Generation with A New Question Type
Ontology [6.017006996402699]
We investigate the less-explored task of generating open-ended questions that are typically answered by multiple sentences.
We first define a new question type ontology which differentiates the nuanced nature of questions better than widely used question words.
We then propose a novel question type-aware question generation framework, augmented by a semantic graph representation.
arXiv Detail & Related papers (2021-07-01T00:02:03Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z) - Challenges in Information-Seeking QA: Unanswerable Questions and
Paragraph Retrieval [46.3246135936476]
We analyze why answering information-seeking queries is more challenging and where their prevalent unanswerabilities arise.
Our controlled experiments suggest two headrooms -- paragraph selection and answerability prediction.
We manually annotate 800 unanswerable examples across six languages on what makes them challenging to answer.
arXiv Detail & Related papers (2020-10-22T17:48:17Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.