Related papers: Focusing on Students, not Machines: Grounded Question Generation and Automated Answer Grading

Focusing on Students, not Machines: Grounded Question Generation and Automated Answer Grading

URL: http://arxiv.org/abs/2506.12066v1
Date: Mon, 02 Jun 2025 05:32:23 GMT
Title: Focusing on Students, not Machines: Grounded Question Generation and Automated Answer Grading
Authors: Gérôme Meyer, Philip Breuer,
Abstract summary: This thesis presents the foundation for a system that generates questions grounded in class materials and automatically grades student answers.<n>It introduces a sophisticated method for chunking documents with a visual layout, specifically targeting PDF documents.<n>It introduces a new benchmark for automated grading of short answers to facilitate comparison of automated grading systems.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Digital technologies are increasingly used in education to reduce the workload of teachers and students. However, creating open-ended study or examination questions and grading their answers is still a tedious task. This thesis presents the foundation for a system that generates questions grounded in class materials and automatically grades student answers. It introduces a sophisticated method for chunking documents with a visual layout, specifically targeting PDF documents. This method enhances the accuracy of downstream tasks, including Retrieval Augmented Generation (RAG). Our thesis demonstrates that high-quality questions and reference answers can be generated from study material. Further, it introduces a new benchmark for automated grading of short answers to facilitate comparison of automated grading systems. An evaluation of various grading systems is conducted and indicates that Large Language Models (LLMs) can generalise to the task of automated grading of short answers from their pre-training tasks. As with other tasks, increasing the parameter size of the LLMs leads to greater performance. Currently, available systems still need human oversight, especially in examination scenarios.

Related papers

Research on the Application of Large Language Models in Automatic Question Generation: A Case Study of ChatGLM in the Context of High School Information Technology Curriculum [3.0753648264454547]
The model is guided to generate diverse questions, which are then comprehensively evaluated by domain experts. The results indicate that ChatGLM outperforms human-generated questions in terms of clarity and teachers' willingness to use.
arXiv Detail & Related papers (2024-08-21T11:38:32Z)
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups. It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics. With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z)
"I understand why I got this grade": Automatic Short Answer Grading with Feedback [36.74896284581596]
We present a dataset of 5.8k student answers accompanied by reference answers and questions for the Automatic Short Answer Grading (ASAG) task. The EngSAF dataset is meticulously curated to cover a diverse range of subjects, questions, and answer patterns from multiple engineering domains.
arXiv Detail & Related papers (2024-06-30T15:42:18Z)
Grade Like a Human: Rethinking Automated Assessment with Large Language Models [11.442433408767583]
Large language models (LLMs) have been used for automated grading, but they have not yet achieved the same level of performance as humans. We propose an LLM-based grading system that addresses the entire grading procedure, including the following key components.
arXiv Detail & Related papers (2024-05-30T05:08:15Z)
Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning [43.83422798569986]
Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and reliable form of assessment. To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content designers. We propose a simple, in-context learning-based solution for automated distractor and corresponding feedback message generation.
arXiv Detail & Related papers (2023-08-07T01:03:04Z)
Rethinking Label Smoothing on Multi-hop Question Answering [87.68071401870283]
Multi-Hop Question Answering (MHQA) is a significant area in question answering. In this work, we analyze the primary factors limiting the performance of multi-hop reasoning. We propose a novel label smoothing technique, F1 Smoothing, which incorporates uncertainty into the learning process.
arXiv Detail & Related papers (2022-12-19T14:48:08Z)
Automatic Short Math Answer Grading via In-context Meta-learning [2.0263791972068628]
We study the problem of automatic short answer grading for students' responses to math questions. We use MathBERT, a variant of the popular language model BERT adapted to mathematical content, as our base model. Second, we use an in-context learning approach that provides scoring examples as input to the language model.
arXiv Detail & Related papers (2022-05-30T16:26:02Z)
Quiz Design Task: Helping Teachers Create Quizzes with Automated Question Generation [87.34509878569916]
This paper focuses on the use case of helping teachers automate the generation of reading comprehension quizzes. In our study, teachers building a quiz receive question suggestions, which they can either accept or refuse with a reason.
arXiv Detail & Related papers (2022-05-03T18:59:03Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Get It Scored Using AutoSAS -- An Automated System for Scoring Short Answers [63.835172924290326]
We present a fast, scalable, and accurate approach towards automated Short Answer Scoring (SAS) We propose and explain the design and development of a system for SAS, namely AutoSAS. AutoSAS shows state-of-the-art performance and achieves better results by over 8% in some of the question prompts.
arXiv Detail & Related papers (2020-12-21T10:47:30Z)
Knowledge Distillation for Improved Accuracy in Spoken Question Answering [63.72278693825945]
We devise a training strategy to perform knowledge distillation from spoken documents and written counterparts. Our work makes a step towards distilling knowledge from the language model as a supervision signal. Experiments demonstrate that our approach outperforms several state-of-the-art language models on the Spoken-SQuAD dataset.
arXiv Detail & Related papers (2020-10-21T15:18:01Z)
Automated Content Grading Using Machine Learning [0.0]
This research project is a primitive experiment in the automation of grading of theoretical answers written in exams by students in technical courses. We show how the algorithmic approach in machine learning can be used to automatically examine and grade theoretical content in exam answer papers.
arXiv Detail & Related papers (2020-04-08T23:46:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.