Related papers: Modeling Programming Skills with Source Code Embeddings for Context-aware Exercise Recommendation

Modeling Programming Skills with Source Code Embeddings for Context-aware Exercise Recommendation

URL: http://arxiv.org/abs/2602.10249v1
Date: Tue, 10 Feb 2026 19:51:48 GMT
Title: Modeling Programming Skills with Source Code Embeddings for Context-aware Exercise Recommendation
Authors: Carlos Eduardo P. Silva, João Pedro M. Sena, Julio C. S. Reis, André G. Santos, Lucas N. Ferreira,
Abstract summary: We propose a context-aware recommender system that models students' programming skills using embeddings of the source code they submit throughout a course.<n>These embeddings predict students' skills across multiple programming topics, producing profiles that are matched to the skills required by unseen homework problems.<n>We evaluated our approach using real data from students and exercises in an introductory programming course at our university.
Score: 0.5872014229110214
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose a context-aware recommender system that models students' programming skills using embeddings of the source code they submit throughout a course. These embeddings predict students' skills across multiple programming topics, producing profiles that are matched to the skills required by unseen homework problems. To generate recommendations, we compute the cosine similarity between student profiles and problem skill vectors, ranking exercises according to their alignment with each student's current abilities. We evaluated our approach using real data from students and exercises in an introductory programming course at our university. First, we assessed the effectiveness of our source code embeddings for predicting skills, comparing them with token-based and graph-based alternatives. Results showed that Jina embeddings outperformed TF-IDF, CodeBERT-cpp, and GraphCodeBERT across most skills. Additionally, we evaluated the system's ability to recommend exercises aligned with weekly course content by analyzing student submissions collected over seven course offerings. Our approach consistently produced more suitable recommendations than baselines based on correctness or solution time, indicating that predicted programming skills provide a stronger signal for problem recommendation.

Related papers

Detecting Struggling Student Programmers using Proficiency Taxonomies [3.936187569159195]
Early detection of struggling student programmers is crucial for providing them with personalized support.<n>This study addresses this gap by developing in collaboration with educators that categorizes how students solve coding tasks and is embedded in the detection model.<n>Our model, termed the taxonomy Model (PTM), simultaneously learns the student's coding skills based on their coding history and predicts whether they will struggle on a new task.
arXiv Detail & Related papers (2025-08-24T13:18:53Z)
Towards a Real-World Aligned Benchmark for Unlearning in Recommender Systems [49.766845975588275]
We propose a set of design desiderata and research questions to guide the development of a more realistic benchmark for unlearning in recommender systems.<n>We argue for an unlearning setup that reflects the sequential, time-sensitive nature of real-world deletion requests.<n>We present a preliminary experiment in a next-basket recommendation setting based on our proposed desiderata and find that unlearning also works for sequential recommendation models.
arXiv Detail & Related papers (2025-08-23T16:05:40Z)
Pre-trained Language Model and Knowledge Distillation for Lightweight Sequential Recommendation [51.25461871988366]
We propose a sequential recommendation algorithm based on a pre-trained language model and knowledge distillation. The proposed algorithm enhances recommendation accuracy and provide timely recommendation services.
arXiv Detail & Related papers (2024-09-23T08:39:07Z)
Learning-Augmented Algorithms with Explicit Predictors [67.02156211760415]
Recent advances in algorithmic design show how to utilize predictions obtained by machine learning models from past and present data. Prior research in this context was focused on a paradigm where the predictor is pre-trained on past data and then used as a black box. In this work, we unpack the predictor and integrate the learning problem it gives rise for within the algorithmic challenge.
arXiv Detail & Related papers (2024-03-12T08:40:21Z)
Personalized Programming Guidance based on Deep Programming Learning Style Capturing [9.152344993023503]
We propose a novel model called Programming Exercise Recommender with Learning Style (PERS) PERS simulates learners' intricate programming behaviors. We perform extensive experiments on two real-world datasets to verify the rationality of modeling programming learning styles.
arXiv Detail & Related papers (2024-02-20T10:38:38Z)
Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations [63.19448893196642]
We present a novel framework that harnesses the rich contextual information and semantic representations provided by large language models to analyze behavior graphs. By leveraging this capability, our framework enables personalized and accurate job recommendations for individual users.
arXiv Detail & Related papers (2023-07-10T11:29:41Z)
Adaptive Scaffolding in Block-Based Programming via Synthesizing New Tasks as Pop Quizzes [30.127552292093384]
We introduce a scaffolding framework based on pop quizzes presented as multi-choice programming tasks. To automatically generate these pop quizzes, we propose a novel algorithm, PQuizSyn. Our algorithm synthesizes new tasks for pop quizzes with the following features: (a) Adaptive (i.e., individualized to the student's current attempt), (b) Comprehensible (i.e., easy to comprehend and solve), and (c) Concealing, do not reveal the solution code.
arXiv Detail & Related papers (2023-03-28T23:52:15Z)
Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs [58.94569213396991]
We propose a hierarchical programmatic reinforcement learning framework to produce program policies. By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors. The experimental results in the Karel domain show that our proposed framework outperforms baselines.
arXiv Detail & Related papers (2023-01-30T14:50:46Z)
Programming Knowledge Tracing: A Comprehensive Dataset and A New Model [26.63441910982382]
We propose a new model PDKT to exploit the enriched context for accurate student behavior prediction. We construct a bipartite graph for programming problem embedding, and design an improved pre-training model PLCodeBERT for code embedding. Experimental results on the new dataset BePKT show that our proposed model establishes state-of-the-art performance in programming knowledge tracing.
arXiv Detail & Related papers (2021-12-11T02:13:11Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
UniNet: Next Term Course Recommendation using Deep Learning [0.0]
We propose a deep learning approach to represent how chronological order of course grades affects the probability of success. We have shown that it is possible to obtain a performance of 81.10% on AUC metric using only grade information. This is shown to be meaningful across different student GPA levels and course difficulties.
arXiv Detail & Related papers (2020-09-20T00:07:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.