KASER: Knowledge-Aligned Student Error Simulator for Open-Ended Coding Tasks
- URL: http://arxiv.org/abs/2601.06633v1
- Date: Sat, 10 Jan 2026 17:36:48 GMT
- Title: KASER: Knowledge-Aligned Student Error Simulator for Open-Ended Coding Tasks
- Authors: Zhangqi Duan, Nigel Fernandez, Andrew Lan,
- Abstract summary: We present KASER (Knowledge-Aligned Student Error Simulator), a novel approach that aligns errors with student knowledge.<n>We propose a training method based on reinforcement learning using a hybrid reward that reflects three aspects of student code prediction.
- Score: 1.2593978066564901
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-ended tasks, such as coding problems that are common in computer science education, provide detailed insights into student knowledge. However, training large language models (LLMs) to simulate and predict possible student errors in their responses to these problems can be challenging: they often suffer from mode collapse and fail to fully capture the diversity in syntax, style, and solution approach in student responses. In this work, we present KASER (Knowledge-Aligned Student Error Simulator), a novel approach that aligns errors with student knowledge. We propose a training method based on reinforcement learning using a hybrid reward that reflects three aspects of student code prediction: i) code similarity to the ground-truth, ii) error matching, and iii) code prediction diversity. On two real-world datasets, we perform two levels of evaluation and show that: At the per-student-problem pair level, our method outperforms baselines on code and error prediction; at the per-problem level, our method outperforms baselines on error coverage and simulated code diversity.
Related papers
- BEAGLE: Behavior-Enforced Agent for Grounded Learner Emulation [16.147318846582298]
Simulating student learning behaviors in open-ended problem-solving environments holds potential for education research.<n>However, collecting authentic data is challenging due to privacy concerns and the high cost of longitudinal studies.<n>We present BEAGLE, a neuro-symbolic framework that addresses this bias by incorporating Self-Regulated Learning (SRL) theory into a novel architecture.
arXiv Detail & Related papers (2026-02-06T08:05:15Z) - Readability-Robust Code Summarization via Meta Curriculum Learning [53.44612630063336]
In the real world, code is often poorly structured or obfuscated, significantly degrading model performance.<n>We propose RoFTCodeSum, a novel fine-tuning method that enhances the robustness of code summarization against poorly readable code.
arXiv Detail & Related papers (2026-01-09T02:38:24Z) - UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models [59.693733170193944]
Large language models (LLMs) are shifting from answer providers to intelligent tutors in educational settings.<n>Recent reinforcement learning approaches address this limitation but face two critical challenges.<n>We propose the Unidirectional Cognitive Optimization (UCO) method to address these challenges.
arXiv Detail & Related papers (2025-11-12T01:27:02Z) - Learning to Make MISTAKEs: Modeling Incorrect Student Thinking And Key Errors [58.65143578052761]
This paper presents a new method, MISTAKE, that constructs high-quality synthetic examples of reasoning errors.<n>We evaluate MISTAKE on three educational tasks and find that it results in (1) higher accuracy when simulating incorrect student answers.
arXiv Detail & Related papers (2025-10-13T15:10:38Z) - Embracing Imperfection: Simulating Students with Diverse Cognitive Levels Using LLM-based Agents [36.704574105201864]
Large language models (LLMs) are revolutionizing education, with LLM-based agents playing a key role in simulating student behavior.<n>A major challenge in student simulation is modeling the diverse learning patterns of students at various cognitive levels.
arXiv Detail & Related papers (2025-05-26T13:48:49Z) - Knowledge Tracing in Programming Education Integrating Students' Questions [0.0]
This paper introduces SQKT (Students' Question-based Knowledge Tracing), a knowledge tracing model that leverages students' questions and automatically extracted skill information.<n> Experimental results demonstrate SQKT's superior performance in predicting student completion across various Python programming courses of differing difficulty levels.<n> SQKT can be used to tailor educational content to individual learning needs and design adaptive learning systems in computer science education.
arXiv Detail & Related papers (2025-01-22T14:13:40Z) - LLM-based Cognitive Models of Students with Misconceptions [55.29525439159345]
This paper investigates whether Large Language Models (LLMs) can be instruction-tuned to meet this dual requirement.
We introduce MalAlgoPy, a novel Python library that generates datasets reflecting authentic student solution patterns.
Our insights enhance our understanding of AI-based student models and pave the way for effective adaptive learning systems.
arXiv Detail & Related papers (2024-10-16T06:51:09Z) - Test Case-Informed Knowledge Tracing for Open-ended Coding Tasks [42.22663501257155]
Open-ended coding tasks are common in computer science education.<n>Traditional knowledge tracing (KT) models that only analyze response correctness may not fully capture nuances in student knowledge from student code.<n>We introduce Test case-Informed Knowledge Tracing for Open-ended Coding (TIKTOC), a framework to simultaneously analyze and predict both open-ended student code and whether the code passes each test case.
arXiv Detail & Related papers (2024-09-28T03:13:40Z) - Estimating Difficulty Levels of Programming Problems with Pre-trained Model [18.92661958433282]
The difficulty level of each programming problem serves as an essential reference for guiding students' adaptive learning.
We formulate the problem of automatic difficulty level estimation of each programming problem, given its textual description and a solution example of code.
For tackling this problem, we propose to couple two pre-trained models, one for text modality and the other for code modality, into a unified model.
arXiv Detail & Related papers (2024-06-13T05:38:20Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.