Modeling Student Learning with 3.8 Million Program Traces
- URL: http://arxiv.org/abs/2510.05056v1
- Date: Mon, 06 Oct 2025 17:37:17 GMT
- Title: Modeling Student Learning with 3.8 Million Program Traces
- Authors: Alexis Ross, Megha Srivastava, Jeremiah Blanchard, Jacob Andreas,
- Abstract summary: We introduce a dataset of over 3.8 million programming reasoning traces from users of Pencil Code.<n>We find that models trained on real traces are stronger at modeling diverse student behavior.<n>We show that we can help students recover from mistakes by steering code generation models to identify a sequence of edits that will result in more correct code.
- Score: 52.153493498021895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As programmers write code, they often edit and retry multiple times, creating rich "interaction traces" that reveal how they approach coding tasks and provide clues about their level of skill development. For novice programmers in particular, these traces reflect the diverse reasoning processes they employ to code, such as exploratory behavior to understand how a programming concept works, re-strategizing in response to bugs, and personalizing stylistic choices. In this work, we explore what can be learned from training language models on such reasoning traces: not just about code, but about coders, and particularly students learning to program. We introduce a dataset of over 3.8 million programming reasoning traces from users of Pencil Code, a free online educational platform used by students to learn simple programming concepts. Compared to models trained only on final programs or synthetically-generated traces, we find that models trained on real traces are stronger at modeling diverse student behavior. Through both behavioral and probing analyses, we also find that many properties of code traces, such as goal backtracking or number of comments, can be predicted from learned representations of the students who write them. Building on this result, we show that we can help students recover from mistakes by steering code generation models to identify a sequence of edits that will results in more correct code while remaining close to the original student's style. Together, our results suggest that many properties of code are properties of individual students and that training on edit traces can lead to models that are more steerable, more predictive of student behavior while programming, and better at generating programs in their final states. Code and data is available at https://github.com/meghabyte/pencilcode-public
Related papers
- ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle [24.691302820912888]
Large Language Models (LLMs) have shown strong performance on programming tasks, but can they generate student-like code like real students?<n>We present ParaStudent, a systematic study of LLM-based "student-like" code generation in an introductory programming course setting.
arXiv Detail & Related papers (2025-07-16T23:12:14Z) - Learning Code-Edit Embedding to Model Student Debugging Behavior [2.1485350418225244]
We propose an encoder-decoder-based model that learns meaningful code-edit embeddings between consecutive student code submissions.<n>It enables personalized next-step code suggestions that maintain the student's coding style while improving test case correctness.
arXiv Detail & Related papers (2025-02-26T18:54:39Z) - Toward Exploring the Code Understanding Capabilities of Pre-trained Code Generation Models [12.959392500354223]
We pioneer the transfer of knowledge from pre-trained code generation models to code understanding tasks.
We introduce CL4D, a contrastive learning method designed to enhance the representation capabilities of decoder-only models.
arXiv Detail & Related papers (2024-06-18T06:52:14Z) - Code Representation Learning At Scale [75.04686476303436]
We fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme.
We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language.
We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner.
arXiv Detail & Related papers (2024-02-02T22:19:15Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - Code Execution with Pre-trained Language Models [88.04688617516827]
Most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures.
We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution.
We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension.
arXiv Detail & Related papers (2023-05-08T10:00:05Z) - Enriching Source Code with Contextual Data for Code Completion Models:
An Empirical Study [4.438873396405334]
We aim to answer whether making code easier to understand through using contextual data improves the performance of pre-trained code language models for the task of code completion.
For comments, we find that the models perform better in the presence of multi-line comments.
arXiv Detail & Related papers (2023-04-24T17:09:14Z) - Contrastive Learning for Source Code with Structural and Functional
Properties [66.10710134948478]
We present BOOST, a novel self-supervised model to focus pre-training based on the characteristics of source code.
We employ automated, structure-guided code transformation algorithms that generate functionally equivalent code that looks drastically different from the original one.
We train our model in a way that brings the functionally equivalent code closer and distinct code further through a contrastive learning objective.
arXiv Detail & Related papers (2021-10-08T02:56:43Z) - Learning to Extend Program Graphs to Work-in-Progress Code [31.235862838381966]
We extend the notion of program graphs to work-in-progress code by learning to predict edge relations between tokens.
We consider the tasks of code completion and localizing and repairing variable misuse in a work-in-process scenario.
We demonstrate that training relation-aware models with fine-tuned edges consistently leads to improved performance on both tasks.
arXiv Detail & Related papers (2021-05-28T18:12:22Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.