Related papers: Improving Deep Knowledge Tracing via Gated Architectures and Adaptive Optimization

Improving Deep Knowledge Tracing via Gated Architectures and Adaptive Optimization

URL: http://arxiv.org/abs/2504.20070v1
Date: Thu, 24 Apr 2025 14:24:31 GMT
Title: Improving Deep Knowledge Tracing via Gated Architectures and Adaptive Optimization
Authors: Altun Shukurlu,
Abstract summary: Deep Knowledge Tracing (DKT) models student learning behavior by using Recurrent Networks (RNNs) to predict future performance based on historical interaction data.<n>In this work, we revisit the DKT model from two perspectives: architectural improvements and optimization.<n>First, we enhance the model using gated recurrent units, specifically Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU)<n>Second, we re-implement DKT using the PyTorch framework, enabling a modular and accessible infrastructure compatible with modern deep learning.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Knowledge Tracing (DKT) models student learning behavior by using Recurrent Neural Networks (RNNs) to predict future performance based on historical interaction data. However, the original implementation relied on standard RNNs in the Lua-based Torch framework, which limited extensibility and reproducibility. In this work, we revisit the DKT model from two perspectives: architectural improvements and optimization efficiency. First, we enhance the model using gated recurrent units, specifically Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU), which better capture long-term dependencies and help mitigate vanishing gradient issues. Second, we re-implement DKT using the PyTorch framework, enabling a modular and accessible infrastructure compatible with modern deep learning workflows. We also benchmark several optimization algorithms SGD, RMSProp, Adagrad, Adam, and AdamW to evaluate their impact on convergence speed and predictive accuracy in educational modeling tasks. Experiments on the Synthetic-5 and Khan Academy datasets show that GRUs and LSTMs achieve higher accuracy and improved training stability compared to basic RNNs, while adaptive optimizers such as Adam and AdamW consistently outperform SGD in both early-stage learning and final model performance. Our open-source PyTorch implementation provides a reproducible and extensible foundation for future research in neural knowledge tracing and personalized learning systems.

Related papers

Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
We present Layer-wise Feedback Propagation (LFP), a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions to solving a given task.<n>Our method then implements a greedy approach reinforcing helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Robust Learning with Progressive Data Expansion Against Spurious Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Transfer Learning in Deep Learning Models for Building Load Forecasting: Case of Limited Data [0.0]
This paper proposes a Building-to-Building Transfer Learning framework to overcome the problem and enhance the performance of Deep Learning models. The proposed approach improved the forecasting accuracy by 56.8% compared to the case of conventional deep learning where training from scratch is used.
arXiv Detail & Related papers (2023-01-25T16:05:47Z)
Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors. Our work is the first attempt to optimize BNNs from the bilinear perspective. We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z)
RLFlow: Optimising Neural Network Subgraph Transformation with World Models [0.0]
We propose a model-based agent which learns to optimise the architecture of neural networks by performing a sequence of subgraph transformations to reduce model runtime. We show our approach can match the performance of state of the art on common convolutional networks and outperform those by up to 5% on transformer-style architectures.
arXiv Detail & Related papers (2022-05-03T11:52:54Z)
Temporal Convolution Domain Adaptation Learning for Crops Growth Prediction [5.966652553573454]
We construct an innovative network architecture based on domain adaptation learning to predict crops growth curves with limited available crop data. We are the first to use the temporal convolution filters as the backbone to construct a domain adaptation network architecture. Results show that the proposed temporal convolution-based network architecture outperforms all benchmarks not only in accuracy but also in model size and convergence rate.
arXiv Detail & Related papers (2022-02-24T14:22:36Z)
Improving Deep Learning for HAR with shallow LSTMs [70.94062293989832]
We propose to alter the DeepConvLSTM to employ a 1-layered instead of a 2-layered LSTM. Our results stand in contrast to the belief that one needs at least a 2-layered LSTM when dealing with sequential data.
arXiv Detail & Related papers (2021-08-02T08:14:59Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
On the Interpretability of Deep Learning Based Models for Knowledge Tracing [5.120837730908589]
Knowledge tracing allows Intelligent Tutoring Systems to infer which topics or skills a student has mastered. Deep Learning based models like Deep Knowledge Tracing (DKT) and Dynamic Key-Value Memory Network (DKVMN) have achieved significant improvements. However, these deep learning based models are not as interpretable as other models because the decision-making process learned by deep neural networks is not wholly understood.
arXiv Detail & Related papers (2021-01-27T11:55:03Z)
Generalized Reinforcement Meta Learning for Few-Shot Optimization [3.7675996866306845]
We present a generic and flexible Reinforcement Learning (RL) based meta-learning framework for the problem of few-shot learning. Our framework could be easily extended to do network architecture search.
arXiv Detail & Related papers (2020-05-04T03:21:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.