Reinforcement Learning Guided Multi-Objective Exam Paper Generation
- URL: http://arxiv.org/abs/2303.01042v1
- Date: Thu, 2 Mar 2023 07:55:52 GMT
- Title: Reinforcement Learning Guided Multi-Objective Exam Paper Generation
- Authors: Yuhu Shang, Xuexiong Luo, Lihong Wang, Hao Peng, Xiankun Zhang, Yimeng
Ren, Kun Liang
- Abstract summary: We propose a reinforcement learning guided Multi-Objective Exam Paper Generation framework, termed MOEPG.
It simultaneously optimize three exam domain-specific objectives including difficulty degree, distribution of exam scores, and skill coverage.
We show that MOEPG is feasible in addressing the multiple dilemmas of exam paper generation scenario.
- Score: 21.945655389912112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To reduce the repetitive and complex work of instructors, exam paper
generation (EPG) technique has become a salient topic in the intelligent
education field, which targets at generating high-quality exam paper
automatically according to instructor-specified assessment criteria. The
current advances utilize the ability of heuristic algorithms to optimize
several well-known objective constraints, such as difficulty degree, number of
questions, etc., for producing optimal solutions. However, in real scenarios,
considering other equally relevant objectives (e.g., distribution of exam
scores, skill coverage) is extremely important. Besides, how to develop an
automatic multi-objective solution that finds an optimal subset of questions
from a huge search space of large-sized question datasets and thus composes a
high-quality exam paper is urgent but non-trivial. To this end, we skillfully
design a reinforcement learning guided Multi-Objective Exam Paper Generation
framework, termed MOEPG, to simultaneously optimize three exam domain-specific
objectives including difficulty degree, distribution of exam scores, and skill
coverage. Specifically, to accurately measure the skill proficiency of the
examinee group, we first employ deep knowledge tracing to model the interaction
information between examinees and response logs. We then design the flexible
Exam Q-Network, a function approximator, which automatically selects the
appropriate question to update the exam paper composition process. Later, MOEPG
divides the decision space into multiple subspaces to better guide the updated
direction of the exam paper. Through extensive experiments on two real-world
datasets, we demonstrate that MOEPG is feasible in addressing the multiple
dilemmas of exam paper generation scenario.
Related papers
- Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents [61.41316121093604]
We present InsCoQA, a novel benchmark for evaluating large language models (LLMs) in the context of conversational question answering (CQA)
Sourced from extensive, encyclopedia-style instructional content, InsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents.
We also propose InsEval, an LLM-assisted evaluator that measures the integrity and accuracy of generated responses and procedural instructions.
arXiv Detail & Related papers (2024-10-01T09:10:00Z) - Application of Large Language Models in Automated Question Generation: A Case Study on ChatGLM's Structured Questions for National Teacher Certification Exams [2.7363336723930756]
This study explores the application potential of the large language models (LLMs) ChatGLM in the automatic generation of structured questions for National Teacher Certification Exams (NTCE)
We guided ChatGLM to generate a series of simulated questions and conducted a comprehensive comparison with questions recollected from past examinees.
The research results indicate that the questions generated by ChatGLM exhibit a high level of rationality, scientificity, and practicality similar to those of the real exam questions.
arXiv Detail & Related papers (2024-08-19T13:32:14Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
This dataset includes figures such as schematic diagrams, simulated images, macroscopic/microscopic photos, and experimental visualizations.
We developed benchmarks for scientific figure captioning and multiple-choice questions, evaluating six proprietary and over ten open-source models.
The dataset and benchmarks will be released to support further research.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation [9.390902237835457]
We propose a new method to measure the task-specific accuracy of Retrieval-Augmented Large Language Models (RAG)
Evaluation is performed by scoring the RAG on an automatically-generated synthetic exam composed of multiple choice questions.
arXiv Detail & Related papers (2024-05-22T13:14:11Z) - Enhancing Textbook Question Answering Task with Large Language Models
and Retrieval Augmented Generation [3.948068081583197]
This paper proposes a methodology that handle the out-of-domain scenario in Textbook question answering (TQA)
Through supervised fine-tuning of the LLM model Llama-2 and the incorporation of RAG, our architecture outperforms the baseline, achieving a 4.12% accuracy improvement on validation set and 9.84% on test set for non-diagram multiple-choice questions.
arXiv Detail & Related papers (2024-02-05T11:58:56Z) - RethinkingTMSC: An Empirical Study for Target-Oriented Multimodal
Sentiment Classification [70.9087014537896]
Target-oriented Multimodal Sentiment Classification (TMSC) has gained significant attention among scholars.
To investigate the causes of this problem, we perform extensive empirical evaluation and in-depth analysis of the datasets.
arXiv Detail & Related papers (2023-10-14T14:52:37Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with
a Focus on Candidate Response Distribution [38.58190457533888]
We introduce the task of candidate distribution matching, propose several evaluation metrics for the task, and demonstrate that automatic systems trained on RACE++ can be leveraged as baselines for our task.
We further demonstrate that these automatic systems can be used for practical pre-test evaluation tasks such as detecting underperforming distractors.
arXiv Detail & Related papers (2023-06-22T17:13:08Z) - NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks.
Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth.
Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z) - ExamGAN and Twin-ExamGAN for Exam Script Generation [3.1902272671210468]
It is unknown yet how to generate an exam script which can result in a desirable distribution of student scores in a class.
It is unknown so far how to generate a pair of high quality exam scripts which are equivalent in assessment.
This paper proposes ExamGAN to generate high quality exam scripts, and then extends ExamGAN to T-ExamGAN to generate a pair of high quality exam scripts.
arXiv Detail & Related papers (2021-08-22T07:34:15Z) - Quality meets Diversity: A Model-Agnostic Framework for Computerized
Adaptive Testing [60.38182654847399]
Computerized Adaptive Testing (CAT) is emerging as a promising testing application in many scenarios.
We propose a novel framework, namely Model-Agnostic Adaptive Testing (MAAT) for CAT solution.
arXiv Detail & Related papers (2021-01-15T06:48:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.