Personalized Auto-Grading and Feedback System for Constructive Geometry Tasks Using Large Language Models on an Online Math Platform
- URL: http://arxiv.org/abs/2509.25529v1
- Date: Mon, 29 Sep 2025 21:35:01 GMT
- Title: Personalized Auto-Grading and Feedback System for Constructive Geometry Tasks Using Large Language Models on an Online Math Platform
- Authors: Yong Oh Lee, Byeonghun Bang, Joohyun Lee, Sejun Oh,
- Abstract summary: We present a personalized auto-grading and feedback system for constructive geometry tasks.<n>The system was developed using large language models (LLMs) and deployed on the Algeomath platform.
- Score: 3.820691846175567
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: As personalized learning gains increasing attention in mathematics education, there is a growing demand for intelligent systems that can assess complex student responses and provide individualized feedback in real time. In this study, we present a personalized auto-grading and feedback system for constructive geometry tasks, developed using large language models (LLMs) and deployed on the Algeomath platform, a Korean online tool designed for interactive geometric constructions. The proposed system evaluates student-submitted geometric constructions by analyzing their procedural accuracy and conceptual understanding. It employs a prompt-based grading mechanism using GPT-4, where student answers and model solutions are compared through a few-shot learning approach. Feedback is generated based on teacher-authored examples built from anticipated student responses, and it dynamically adapts to the student's problem-solving history, allowing up to four iterative attempts per question. The system was piloted with 79 middle-school students, where LLM-generated grades and feedback were benchmarked against teacher judgments. Grading closely aligned with teachers, and feedback helped many students revise errors and complete multi-step geometry tasks. While short-term corrections were frequent, longer-term transfer effects were less clear. Overall, the study highlights the potential of LLMs to support scalable, teacher-aligned formative assessment in mathematics, while pointing to improvements needed in terminology handling and feedback design.
Related papers
- Automated Feedback Generation for Undergraduate Mathematics: Development and Evaluation of an AI Teaching Assistant [0.0]
We present a system that processes free-form natural language input, handles a wide range of edge cases, and comments on the technical correctness of submitted proofs.<n>We show that by the metrics we evaluate, the quality of the feedback generated is comparable to that produced by human experts.<n>A version of our tool is deployed on the Imperial mathematics homework platform Lambda.
arXiv Detail & Related papers (2026-01-06T23:02:22Z) - Teaching According to Students' Aptitude: Personalized Mathematics Tutoring via Persona-, Memory-, and Forgetting-Aware LLMs [28.594039597149266]
We propose TASA (Teaching According to Students' Aptitude), a student-aware tutoring framework that integrates persona, memory, and forgetting dynamics.<n>Specifically, TASA maintains a structured student persona capturing proficiency profiles and an event memory recording prior learning interactions.<n>By incorporating a continuous forgetting curve with knowledge tracing, TASA dynamically updates each student's mastery state and generates contextually appropriate, difficulty-calibrated questions and explanations.
arXiv Detail & Related papers (2025-11-19T06:28:16Z) - MathEDU: Towards Adaptive Feedback for Student Mathematical Problem-Solving [3.2962799070467432]
This paper explores the capabilities of large language models (LLMs) to assess students' math problem-solving processes and provide adaptive feedback.<n>We evaluate the model's ability to support personalized learning in two scenarios: one where the model has access to students' prior answer histories, and another simulating a cold-start context.
arXiv Detail & Related papers (2025-05-23T15:59:39Z) - Automated Feedback in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses [0.0]
This study aims to explore the potential of Large Language Models (LLMs) in facilitating automated feedback in math education.
We employ Mistral, a version of Llama catered to math, and fine-tune this model for evaluating student responses by leveraging a dataset of student responses and teacher-written feedback for middle-school math problems.
We evaluate the model's performance in scoring accuracy and the quality of feedback by utilizing judgments from 2 teachers.
arXiv Detail & Related papers (2024-10-29T16:57:45Z) - Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors [78.53699244846285]
Large language models (LLMs) present an opportunity to scale high-quality personalized education to all.
LLMs struggle to precisely detect student's errors and tailor their feedback to these errors.
Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions.
arXiv Detail & Related papers (2024-07-12T10:11:40Z) - Evaluating and Optimizing Educational Content with Large Language Model Judgments [52.33701672559594]
We use Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes.
We introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function.
Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences.
arXiv Detail & Related papers (2024-03-05T09:09:15Z) - Improving the Validity of Automatically Generated Feedback via Reinforcement Learning [46.667783153759636]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)<n>Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z) - YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework.
It emulates the teacher-student education process to improve the efficacy of model fine-tuning.
Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z) - Empowering Private Tutoring by Chaining Large Language Models [87.76985829144834]
This work explores the development of a full-fledged intelligent tutoring system powered by state-of-the-art large language models (LLMs)
The system is into three inter-connected core processes-interaction, reflection, and reaction.
Each process is implemented by chaining LLM-powered tools along with dynamically updated memory modules.
arXiv Detail & Related papers (2023-09-15T02:42:03Z) - MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties
Grounded in Math Reasoning Problems [74.73881579517055]
We propose a framework to generate such dialogues by pairing human teachers with a Large Language Model prompted to represent common student errors.
We describe how we use this framework to collect MathDial, a dataset of 3k one-to-one teacher-student tutoring dialogues.
arXiv Detail & Related papers (2023-05-23T21:44:56Z) - Deep Discourse Analysis for Generating Personalized Feedback in
Intelligent Tutor Systems [4.716555240531893]
We explore creating automated, personalized feedback in an intelligent tutoring system (ITS)
Our goal is to pinpoint correct and incorrect concepts in student answers in order to achieve better student learning gains.
arXiv Detail & Related papers (2021-03-13T20:33:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.