Pattern-based Knowledge Component Extraction from Student Code Using Representation Learning
- URL: http://arxiv.org/abs/2508.09281v2
- Date: Mon, 13 Oct 2025 19:56:25 GMT
- Title: Pattern-based Knowledge Component Extraction from Student Code Using Representation Learning
- Authors: Muntasir Hoq, Griffin Pitts, Andrew Lan, Peter Brusilovsky, Bita Akram,
- Abstract summary: This work advances knowledge modeling in computer science education by providing an automated, scalable, and explainable framework for identifying granular code patterns and algorithmic constructs, essential for student learning.
- Score: 2.726913697825415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effective personalized learning in computer science education depends on accurately modeling what students know and what they need to learn. While Knowledge Components (KCs) provide a foundation for such modeling, automated KC extraction from student code is inherently challenging due to insufficient explainability of discovered KCs and the open-endedness of programming problems with significant structural variability across student solutions and complex interactions among programming concepts. In this work, we propose a novel, explainable framework for automated KC discovery through pattern-based KCs: recurring structural patterns within student code that capture the specific programming patterns and language constructs that students must master. Toward this, we train a Variational Autoencoder to generate important representative patterns from student code guided by an explainable, attention-based code representation model that identifies important correct and incorrect pattern implementations from student code. These patterns are then clustered to form pattern-based KCs. We evaluate our KCs using two well-established methods informed by Cognitive Science: learning curve analysis and Deep Knowledge Tracing (DKT). Experimental results demonstrate meaningful learning trajectories and significant improvements in DKT predictive performance over traditional KT methods. This work advances knowledge modeling in CS education by providing an automated, scalable, and explainable framework for identifying granular code patterns and algorithmic constructs, essential for student learning.
Related papers
- Representation Learning of Auxiliary Concepts for Improved Student Modeling and Exercise Recommendation [0.0]
We propose a deep learning model that learns sparse binary representations of exercises.<n>These representations capture conceptual structure beyond human-defined annotations.<n>We show that incorporating auxiliary KCs improves both student modeling and adaptive exercise recommendation.
arXiv Detail & Related papers (2025-08-22T10:12:35Z) - How Do Code Smells Affect Skill Growth in Scratch Novice Programmers? [3.8506666685467343]
The study will deliver the first large-scale, fine-grained map linking specific CT competencies to concrete design flaws and antipatterns.<n>By clarifying how programming habits influence early skill acquisition, the work advances both computing-education theory and practical tooling for sustainable software maintenance and evolution.
arXiv Detail & Related papers (2025-07-23T08:30:06Z) - MAS-KCL: Knowledge component graph structure learning with large language model-based agentic workflow [12.083628171166733]
An accurate KC graph can assist educators in identifying the root causes of learners' poor performance on specific KCs.<n>We have developed a KC graph structure learning algorithm, named MAS-KCL, which employs a multi-agent system driven by large language models.
arXiv Detail & Related papers (2025-05-20T09:32:47Z) - Automated Identification of Logical Errors in Programs: Advancing Scalable Analysis of Student Misconceptions [4.0782995609938]
This paper presents a scalable framework for automatically detecting logical errors in students' programming solutions.<n>Our framework is based on an explainable Abstract Syntax Tree (AST) embedding model, the Subtree-based Attention Neural Network (SANN)
arXiv Detail & Related papers (2025-05-16T06:32:51Z) - LLM4CD: Leveraging Large Language Models for Open-World Knowledge Augmented Cognitive Diagnosis [56.50378080174923]
We propose LLM4CD, which Leverages Large Language Models for Open-World Knowledge Augmented Cognitive Diagnosis.<n>Our method utilizes the open-world knowledge of LLMs to construct cognitively expressive textual representations, which are encoded to introduce rich semantic information into the CD task.<n>This approach substitutes traditional ID embeddings with semantic representations, enabling the model to accommodate new students and exercises with open-world knowledge and address the cold-start problem.
arXiv Detail & Related papers (2025-05-14T14:48:00Z) - Automated Knowledge Component Generation for Interpretable Knowledge Tracing in Coding Problems [2.801976382946474]
Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills.<n>We present an automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems.<n>We find that KCGen-KT outperforms existing KT methods and human-written KCs on future student response prediction.
arXiv Detail & Related papers (2025-02-25T20:40:51Z) - Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing [59.480951050911436]
We present KCQRL, a framework for automated knowledge concept annotation and question representation learning.<n>We demonstrate the effectiveness of KCQRL across 15 KT algorithms on two large real-world Math learning datasets.
arXiv Detail & Related papers (2024-10-02T16:37:19Z) - SINKT: A Structure-Aware Inductive Knowledge Tracing Model with Large Language Model [64.92472567841105]
Knowledge Tracing (KT) aims to determine whether students will respond correctly to the next question.
Structure-aware Inductive Knowledge Tracing model with large language model (dubbed SINKT)
SINKT predicts the student's response to the target question by interacting with the student's knowledge state and the question representation.
arXiv Detail & Related papers (2024-07-01T12:44:52Z) - Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings and reasoning mechanisms is a significant challenge.<n>We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences.<n>We demonstrate that generative models like GPT can accurately learn and reason over CFG-defined hierarchies and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z) - Implicit Offline Reinforcement Learning via Supervised Learning [83.8241505499762]
Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels.
We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets.
arXiv Detail & Related papers (2022-10-21T21:59:42Z) - Learning Knowledge Representation with Meta Knowledge Distillation for
Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task.
Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.