Programming Knowledge Tracing: A Comprehensive Dataset and A New Model
- URL: http://arxiv.org/abs/2112.08273v1
- Date: Sat, 11 Dec 2021 02:13:11 GMT
- Title: Programming Knowledge Tracing: A Comprehensive Dataset and A New Model
- Authors: Renyu Zhu, Dongxiang Zhang, Chengcheng Han, Ming Gao, Xuesong Lu,
Weining Qian, Aoying Zhou
- Abstract summary: We propose a new model PDKT to exploit the enriched context for accurate student behavior prediction.
We construct a bipartite graph for programming problem embedding, and design an improved pre-training model PLCodeBERT for code embedding.
Experimental results on the new dataset BePKT show that our proposed model establishes state-of-the-art performance in programming knowledge tracing.
- Score: 26.63441910982382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study knowledge tracing in the domain of programming
education and make two important contributions. First, we harvest and publish
so far the most comprehensive dataset, namely BePKT, which covers various
online behaviors in an OJ system, including programming text problems,
knowledge annotations, user-submitted code and system-logged events. Second, we
propose a new model PDKT to exploit the enriched context for accurate student
behavior prediction. More specifically, we construct a bipartite graph for
programming problem embedding, and design an improved pre-training model
PLCodeBERT for code embedding, as well as a double-sequence RNN model with
exponential decay attention for effective feature fusion. Experimental results
on the new dataset BePKT show that our proposed model establishes
state-of-the-art performance in programming knowledge tracing. In addition, we
verify that our code embedding strategy based on PLCodeBERT is complementary to
existing knowledge tracing models to further enhance their accuracy. As a side
product, PLCodeBERT also results in better performance in other
programming-related tasks such as code clone detection.
Related papers
- Enhancing LLM Code Generation: A Systematic Evaluation of Multi-Agent Collaboration and Runtime Debugging for Improved Accuracy, Reliability, and Latency [0.0]
We study how different programming activities compositions and training paradigms influence code generation effectiveness.<n>Our findings provide valuable insights for organizations seeking robust AI-driven coding solutions.
arXiv Detail & Related papers (2025-05-04T14:44:27Z) - Knowledge Graph Enhanced Generative Multi-modal Models for Class-Incremental Learning [51.0864247376786]
We introduce a Knowledge Graph Enhanced Generative Multi-modal model (KG-GMM) that builds an evolving knowledge graph throughout the learning process.
During testing, we propose a Knowledge Graph Augmented Inference method that locates specific categories by analyzing relationships within the generated text.
arXiv Detail & Related papers (2025-03-24T07:20:43Z) - From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education [2.932399587069876]
This paper introduces Language Model-based Code Knowledge Tracing (CodeLKT)
CodeLKT is an innovative application of Language model-based Knowledge Tracing (LKT) to programming education.
arXiv Detail & Related papers (2024-08-31T01:36:38Z) - Personalized Programming Guidance based on Deep Programming Learning Style Capturing [9.152344993023503]
We propose a novel model called Programming Exercise Recommender with Learning Style (PERS)
PERS simulates learners' intricate programming behaviors.
We perform extensive experiments on two real-world datasets to verify the rationality of modeling programming learning styles.
arXiv Detail & Related papers (2024-02-20T10:38:38Z) - Code Representation Learning At Scale [75.04686476303436]
We fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme.
We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language.
We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner.
arXiv Detail & Related papers (2024-02-02T22:19:15Z) - CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM.
For learning methods, we explore the claim of a "free lunch" hypothesis.
For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z) - Leveraging Key Information Modeling to Improve Less-Data Constrained
News Headline Generation via Duality Fine-Tuning [12.443476695459553]
We propose a novel duality fine-tuning method by formally defining the probabilistic duality constraints between key information prediction and headline generation tasks.
The proposed method can capture more information from limited data, build connections between separate tasks, and is suitable for less-data constrained generation tasks.
We conduct extensive experiments to demonstrate that our method is effective and efficient to achieve improved performance in terms of language modeling metric and informativeness correctness metric on two public datasets.
arXiv Detail & Related papers (2022-10-10T07:59:36Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z) - Leveraging Code Generation to Improve Code Retrieval and Summarization
via Dual Learning [18.354352985591305]
Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query.
Recent studies have combined these two tasks to improve their performance.
We propose a novel end-to-end model for the two tasks by introducing an additional code generation task.
arXiv Detail & Related papers (2020-02-24T12:26:11Z) - CodeBERT: A Pre-Trained Model for Programming and Natural Languages [117.34242908773061]
CodeBERT is a pre-trained model for programming language (PL) and nat-ural language (NL)
We develop CodeBERT with Transformer-based neural architecture.
We evaluate CodeBERT on two NL-PL applications by fine-tuning model parameters.
arXiv Detail & Related papers (2020-02-19T13:09:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.