ExamGAN and Twin-ExamGAN for Exam Script Generation
- URL: http://arxiv.org/abs/2108.09656v1
- Date: Sun, 22 Aug 2021 07:34:15 GMT
- Title: ExamGAN and Twin-ExamGAN for Exam Script Generation
- Authors: Zhengyang Wu, Ke Deng, Judy Qiu, Yong Tang
- Abstract summary: It is unknown yet how to generate an exam script which can result in a desirable distribution of student scores in a class.
It is unknown so far how to generate a pair of high quality exam scripts which are equivalent in assessment.
This paper proposes ExamGAN to generate high quality exam scripts, and then extends ExamGAN to T-ExamGAN to generate a pair of high quality exam scripts.
- Score: 3.1902272671210468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, the learning management system (LMS) has been widely used in
different educational stages from primary to tertiary education for student
administration, documentation, tracking, reporting, and delivery of educational
courses, training programs, or learning and development programs. Towards
effective learning outcome assessment, the exam script generation problem has
attracted many attentions and been investigated recently. But the research in
this field is still in its early stage. There are opportunities to further
improve the quality of generated exam scripts in various aspects. In
particular, two essential issues have been ignored largely by existing
solutions. First, given a course, it is unknown yet how to generate an exam
script which can result in a desirable distribution of student scores in a
class (or across different classes). Second, while it is frequently encountered
in practice, it is unknown so far how to generate a pair of high quality exam
scripts which are equivalent in assessment (i.e., the student scores are
comparable by taking either of them) but have significantly different sets of
questions. To fill the gap, this paper proposes ExamGAN (Exam Script Generative
Adversarial Network) to generate high quality exam scripts, and then extends
ExamGAN to T-ExamGAN (Twin-ExamGAN) to generate a pair of high quality exam
scripts. Based on extensive experiments on three benchmark datasets, it has
verified the superiority of proposed solutions in various aspects against the
state-of-the-art. Moreover, we have conducted a case study which demonstrated
the effectiveness of proposed solution in a real teaching scenario.
Related papers
- TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models [8.22619177301814]
We introduce TestBench, a benchmark for class-level LLM-based test case generation.
We construct a dataset of 108 Java programs from 9 real-world, large-scale projects on GitHub.
We propose a fine-grained evaluation framework that considers five aspects of test cases: syntactic correctness, compilation correctness, test correctness, code coverage rate, and defect detection rate.
arXiv Detail & Related papers (2024-09-26T06:18:06Z) - A Comprehensive Survey on Underwater Image Enhancement Based on Deep Learning [51.7818820745221]
Underwater image enhancement (UIE) presents a significant challenge within computer vision research.
Despite the development of numerous UIE algorithms, a thorough and systematic review is still absent.
arXiv Detail & Related papers (2024-05-30T04:46:40Z) - Computer Aided Design and Grading for an Electronic Functional
Programming Exam [0.0]
We introduce an algorithm to check Proof Puzzles based on finding correct sequences of proof lines that improves fairness compared to an existing, edit distance based algorithm.
A higher-level language and open-source tool to specify regular expressions makes creating complex regular expressions less error-prone.
We evaluate the resulting e-exam by analyzing the degree of automation in the grading process, asking students for their opinion, and critically reviewing our own experiences.
arXiv Detail & Related papers (2023-08-14T07:08:09Z) - Benchmarking Foundation Models with Language-Model-as-an-Examiner [47.345760054595246]
We propose a novel benchmarking framework, Language-Model-as-an-Examiner.
The LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner.
arXiv Detail & Related papers (2023-06-07T06:29:58Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - Reinforcement Learning Guided Multi-Objective Exam Paper Generation [21.945655389912112]
We propose a reinforcement learning guided Multi-Objective Exam Paper Generation framework, termed MOEPG.
It simultaneously optimize three exam domain-specific objectives including difficulty degree, distribution of exam scores, and skill coverage.
We show that MOEPG is feasible in addressing the multiple dilemmas of exam paper generation scenario.
arXiv Detail & Related papers (2023-03-02T07:55:52Z) - NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks.
Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth.
Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - Automated Performance Testing Based on Active Deep Learning [2.179313476241343]
We present an automated test generation method called ACTA for black-box performance testing.
ACTA is based on active learning, which means that it does not require a large set of historical test data to learn about the performance characteristics of the system under test.
We have evaluated ACTA on a benchmark web application, and the experimental results indicate that this method is comparable with random testing.
arXiv Detail & Related papers (2021-04-05T18:19:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.