Related papers: Reinforcement Learning Guided Multi-Objective Exam Paper Generation

Reinforcement Learning Guided Multi-Objective Exam Paper Generation

URL: http://arxiv.org/abs/2303.01042v1
Date: Thu, 2 Mar 2023 07:55:52 GMT
Title: Reinforcement Learning Guided Multi-Objective Exam Paper Generation
Authors: Yuhu Shang, Xuexiong Luo, Lihong Wang, Hao Peng, Xiankun Zhang, Yimeng Ren, Kun Liang
Abstract summary: We propose a reinforcement learning guided Multi-Objective Exam Paper Generation framework, termed MOEPG. It simultaneously optimize three exam domain-specific objectives including difficulty degree, distribution of exam scores, and skill coverage. We show that MOEPG is feasible in addressing the multiple dilemmas of exam paper generation scenario.
Score: 21.945655389912112
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To reduce the repetitive and complex work of instructors, exam paper generation (EPG) technique has become a salient topic in the intelligent education field, which targets at generating high-quality exam paper automatically according to instructor-specified assessment criteria. The current advances utilize the ability of heuristic algorithms to optimize several well-known objective constraints, such as difficulty degree, number of questions, etc., for producing optimal solutions. However, in real scenarios, considering other equally relevant objectives (e.g., distribution of exam scores, skill coverage) is extremely important. Besides, how to develop an automatic multi-objective solution that finds an optimal subset of questions from a huge search space of large-sized question datasets and thus composes a high-quality exam paper is urgent but non-trivial. To this end, we skillfully design a reinforcement learning guided Multi-Objective Exam Paper Generation framework, termed MOEPG, to simultaneously optimize three exam domain-specific objectives including difficulty degree, distribution of exam scores, and skill coverage. Specifically, to accurately measure the skill proficiency of the examinee group, we first employ deep knowledge tracing to model the interaction information between examinees and response logs. We then design the flexible Exam Q-Network, a function approximator, which automatically selects the appropriate question to update the exam paper composition process. Later, MOEPG divides the decision space into multiple subspaces to better guide the updated direction of the exam paper. Through extensive experiments on two real-world datasets, we demonstrate that MOEPG is feasible in addressing the multiple dilemmas of exam paper generation scenario.

Related papers

Benchmarking Multimodal Understanding and Complex Reasoning for ESG Tasks [56.350173737493215]
Environmental, Social, and Governance (ESG) reports are essential for evaluating sustainability practices, ensuring regulatory compliance, and promoting financial transparency.<n>MMESGBench is a first-of-its-kind benchmark dataset to evaluate multimodal understanding and complex reasoning across structurally diverse and multi-source ESG documents.<n>MMESGBench comprises 933 validated QA pairs derived from 45 ESG documents, spanning across seven distinct document types and three major ESG source categories.
arXiv Detail & Related papers (2025-07-25T03:58:07Z)
Reinforcing Question Answering Agents with Minimalist Policy Gradient Optimization [80.09112808413133]
Mujica is a planner that decomposes questions into acyclic graph of subquestions and a worker that resolves questions via retrieval and reasoning.<n>MyGO is a novel reinforcement learning method that replaces traditional policy updates with gradient Likelihood Maximum Estimation.<n> Empirical results across multiple datasets demonstrate the effectiveness of MujicaMyGO in enhancing multi-hop QA performance.
arXiv Detail & Related papers (2025-05-20T18:33:03Z)
Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering [3.6799953119508735]
We propose a novel approach to multimodal textbook question answering by introducing a mechanism for enhancing semantic representations.<n>Our model, Joint Embedding Training With Ranking Supervision for Textbook Question Answering (JETRTQA), is a multimodal learning framework built on a retriever-generator architecture.<n>We evaluate our method on the CK12-QA dataset and demonstrate that it significantly improves the discrimination between informative and irrelevant documents.
arXiv Detail & Related papers (2025-05-17T13:23:54Z)
Causally Aligned Curriculum Learning [69.11672390876763]
This paper studies the problem of curriculum RL through causal lenses. We derive a sufficient graphical condition characterizing causally aligned source tasks. We develop an efficient algorithm to generate a causally aligned curriculum.
arXiv Detail & Related papers (2025-03-21T02:20:38Z)
Aligning Multimodal LLM with Human Preference: A Survey [62.89722942008262]
Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs) have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed.
arXiv Detail & Related papers (2025-03-18T17:59:56Z)
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing [35.686125031177234]
Multi-Document Summarization (MDS) is a challenging task that focuses on extracting and synthesizing useful information from multiple lengthy documents. We propose a novel framework that leverages inference-time scaling for this task. We also introduce two new evaluation metrics: Consistency-Aware Preference (CAP) score and LLM Atom-Content-Unit (ACU) score.
arXiv Detail & Related papers (2025-02-27T23:34:47Z)
Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents [61.41316121093604]
We present InsCoQA, a novel benchmark for evaluating large language models (LLMs) in the context of conversational question answering (CQA) Sourced from extensive, encyclopedia-style instructional content, InsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents. We also propose InsEval, an LLM-assisted evaluator that measures the integrity and accuracy of generated responses and procedural instructions.
arXiv Detail & Related papers (2024-10-01T09:10:00Z)
Application of Large Language Models in Automated Question Generation: A Case Study on ChatGLM's Structured Questions for National Teacher Certification Exams [2.7363336723930756]
This study explores the application potential of the large language models (LLMs) ChatGLM in the automatic generation of structured questions for National Teacher Certification Exams (NTCE) We guided ChatGLM to generate a series of simulated questions and conducted a comprehensive comparison with questions recollected from past examinees. The research results indicate that the questions generated by ChatGLM exhibit a high level of rationality, scientificity, and practicality similar to those of the real exam questions.
arXiv Detail & Related papers (2024-08-19T13:32:14Z)
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
This dataset includes figures such as schematic diagrams, simulated images, macroscopic/microscopic photos, and experimental visualizations. We developed benchmarks for scientific figure captioning and multiple-choice questions, evaluating six proprietary and over ten open-source models. The dataset and benchmarks will be released to support further research.
arXiv Detail & Related papers (2024-07-06T00:40:53Z)
Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation [9.390902237835457]
We propose a new method to measure the task-specific accuracy of Retrieval-Augmented Large Language Models (RAG) Evaluation is performed by scoring the RAG on an automatically-generated synthetic exam composed of multiple choice questions.
arXiv Detail & Related papers (2024-05-22T13:14:11Z)
Enhancing Textbook Question Answering Task with Large Language Models and Retrieval Augmented Generation [3.948068081583197]
This paper proposes a methodology that handle the out-of-domain scenario in Textbook question answering (TQA) Through supervised fine-tuning of the LLM model Llama-2 and the incorporation of RAG, our architecture outperforms the baseline, achieving a 4.12% accuracy improvement on validation set and 9.84% on test set for non-diagram multiple-choice questions.
arXiv Detail & Related papers (2024-02-05T11:58:56Z)
RethinkingTMSC: An Empirical Study for Target-Oriented Multimodal Sentiment Classification [70.9087014537896]
Target-oriented Multimodal Sentiment Classification (TMSC) has gained significant attention among scholars. To investigate the causes of this problem, we perform extensive empirical evaluation and in-depth analysis of the datasets.
arXiv Detail & Related papers (2023-10-14T14:52:37Z)
Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges. Our model is trained on user queries and LLM-generated responses under massive real-world scenarios. Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z)
Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with a Focus on Candidate Response Distribution [38.58190457533888]
We introduce the task of candidate distribution matching, propose several evaluation metrics for the task, and demonstrate that automatic systems trained on RACE++ can be leveraged as baselines for our task. We further demonstrate that these automatic systems can be used for practical pre-test evaluation tasks such as detecting underperforming distractors.
arXiv Detail & Related papers (2023-06-22T17:13:08Z)
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks. Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth. Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z)
ExamGAN and Twin-ExamGAN for Exam Script Generation [3.1902272671210468]
It is unknown yet how to generate an exam script which can result in a desirable distribution of student scores in a class. It is unknown so far how to generate a pair of high quality exam scripts which are equivalent in assessment. This paper proposes ExamGAN to generate high quality exam scripts, and then extends ExamGAN to T-ExamGAN to generate a pair of high quality exam scripts.
arXiv Detail & Related papers (2021-08-22T07:34:15Z)
Quality meets Diversity: A Model-Agnostic Framework for Computerized Adaptive Testing [60.38182654847399]
Computerized Adaptive Testing (CAT) is emerging as a promising testing application in many scenarios. We propose a novel framework, namely Model-Agnostic Adaptive Testing (MAAT) for CAT solution.
arXiv Detail & Related papers (2021-01-15T06:48:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.