R2DE: a NLP approach to estimating IRT parameters of newly generated
questions
- URL: http://arxiv.org/abs/2001.07569v1
- Date: Tue, 21 Jan 2020 14:31:01 GMT
- Title: R2DE: a NLP approach to estimating IRT parameters of newly generated
questions
- Authors: Luca Benedetto, Andrea Cappelli, Roberto Turrin, Paolo Cremonesi
- Abstract summary: R2DE is a model capable of assessing newly generated multiple-choice questions by looking at the text of the question.
In particular, it can estimate the difficulty and the discrimination of each question.
- Score: 3.364554138758565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The main objective of exams consists in performing an assessment of students'
expertise on a specific subject. Such expertise, also referred to as skill or
knowledge level, can then be leveraged in different ways (e.g., to assign a
grade to the students, to understand whether a student might need some support,
etc.). Similarly, the questions appearing in the exams have to be assessed in
some way before being used to evaluate students. Standard approaches to
questions' assessment are either subjective (e.g., assessment by human experts)
or introduce a long delay in the process of question generation (e.g.,
pretesting with real students). In this work we introduce R2DE (which is a
Regressor for Difficulty and Discrimination Estimation), a model capable of
assessing newly generated multiple-choice questions by looking at the text of
the question and the text of the possible choices. In particular, it can
estimate the difficulty and the discrimination of each question, as they are
defined in Item Response Theory. We also present the results of extensive
experiments we carried out on a real world large scale dataset coming from an
e-learning platform, showing that our model can be used to perform an initial
assessment of newly created questions and ease some of the problems that arise
in question generation.
Related papers
- Qsnail: A Questionnaire Dataset for Sequential Question Generation [76.616068047362]
We present the first dataset specifically constructed for the questionnaire generation task, which comprises 13,168 human-written questionnaires.
We conduct experiments on Qsnail, and the results reveal that retrieval models and traditional generative models do not fully align with the given research topic and intents.
Despite enhancements through the chain-of-thought prompt and finetuning, questionnaires generated by language models still fall short of human-written questionnaires.
arXiv Detail & Related papers (2024-02-22T04:14:10Z) - Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset [0.0]
This paper investigates the potential for the newest version of Large Language Models (LLMs) to be used in short answer questions for formative assessments.
It introduces a novel dataset of short answer reading comprehension questions, drawn from a set of reading assessments conducted with over 150 students in Ghana.
The paper empirically evaluates how well various configurations of generative LLMs grade student short answer responses compared to expert human raters.
arXiv Detail & Related papers (2023-10-26T17:05:40Z) - Benchmarking Foundation Models with Language-Model-as-an-Examiner [47.345760054595246]
We propose a novel benchmarking framework, Language-Model-as-an-Examiner.
The LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner.
arXiv Detail & Related papers (2023-06-07T06:29:58Z) - UKP-SQuARE: An Interactive Tool for Teaching Question Answering [61.93372227117229]
The exponential growth of question answering (QA) has made it an indispensable topic in any Natural Language Processing (NLP) course.
We introduce UKP-SQuARE as a platform for QA education.
Students can run, compare, and analyze various QA models from different perspectives.
arXiv Detail & Related papers (2023-05-31T11:29:04Z) - Towards a Holistic Understanding of Mathematical Questions with
Contrastive Pre-training [65.10741459705739]
We propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo.
We first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes.
Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy.
arXiv Detail & Related papers (2023-01-18T14:23:29Z) - Automatic Generation of Socratic Subquestions for Teaching Math Word
Problems [16.97827669744673]
We explore the ability of large language models (LMs) in generating sequential questions for guiding math word problem-solving.
On both automatic and human quality evaluations, we find that LMs constrained with desirable question properties generate superior questions.
Results suggest that the difficulty level of problems plays an important role in determining whether questioning improves or hinders human performance.
arXiv Detail & Related papers (2022-11-23T10:40:22Z) - What should I Ask: A Knowledge-driven Approach for Follow-up Questions
Generation in Conversational Surveys [63.51903260461746]
We propose a novel task for knowledge-driven follow-up question generation in conversational surveys.
We constructed a new human-annotated dataset of human-written follow-up questions with dialogue history and labeled knowledge.
We then propose a two-staged knowledge-driven model for the task, which generates informative and coherent follow-up questions.
arXiv Detail & Related papers (2022-05-23T00:57:33Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z) - Introducing a framework to assess newly created questions with Natural
Language Processing [3.364554138758565]
We propose a framework to train and evaluate models for estimating the difficulty and discrimination of newly created Multiple Choice Questions.
We implement one model using this framework and test it on a real-world dataset provided by CloudAcademy.
arXiv Detail & Related papers (2020-04-28T13:57:21Z) - Educational Question Mining At Scale: Prediction, Analysis and
Personalization [35.42197158180065]
We propose a framework for mining insights from educational questions at scale.
We utilize the state-of-the-art Bayesian deep learning method, in particular partial variational auto-encoders (p-VAE)
We apply our proposed framework to a real-world dataset with tens of thousands of questions and tens of millions of answers from an online education platform.
arXiv Detail & Related papers (2020-03-12T19:07:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.