Related papers: Knowledge Distillation in RNN-Attention Models for Early Prediction of Student Performance

Knowledge Distillation in RNN-Attention Models for Early Prediction of Student Performance

URL: http://arxiv.org/abs/2412.14526v1
Date: Thu, 19 Dec 2024 04:46:06 GMT
Title: Knowledge Distillation in RNN-Attention Models for Early Prediction of Student Performance
Authors: Sukrit Leelaluk, Cheng Tang, Valdemar Švábenský, Atsushi Shimada,
Abstract summary: We introduce an RNN-Attention-KD (knowledge distillation) framework to predict at-risk students early throughout a course.<n>In an empirical evaluation, RNN-Attention-KD outperforms traditional neural network models in terms of recall and F1-measure.
Score: 3.9596747946226767
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Educational data mining (EDM) is a part of applied computing that focuses on automatically analyzing data from learning contexts. Early prediction for identifying at-risk students is a crucial and widely researched topic in EDM research. It enables instructors to support at-risk students to stay on track, preventing student dropout or failure. Previous studies have predicted students' learning performance to identify at-risk students by using machine learning on data collected from e-learning platforms. However, most studies aimed to identify at-risk students utilizing the entire course data after the course finished. This does not correspond to the real-world scenario that at-risk students may drop out before the course ends. To address this problem, we introduce an RNN-Attention-KD (knowledge distillation) framework to predict at-risk students early throughout a course. It leverages the strengths of Recurrent Neural Networks (RNNs) in handling time-sequence data to predict students' performance at each time step and employs an attention mechanism to focus on relevant time steps for improved predictive accuracy. At the same time, KD is applied to compress the time steps to facilitate early prediction. In an empirical evaluation, RNN-Attention-KD outperforms traditional neural network models in terms of recall and F1-measure. For example, it obtained recall and F1-measure of 0.49 and 0.51 for Weeks 1--3 and 0.51 and 0.61 for Weeks 1--6 across all datasets from four years of a university course. Then, an ablation study investigated the contributions of different knowledge transfer methods (distillation objectives). We found that hint loss from the hidden layer of RNN and context vector loss from the attention module on RNN could enhance the model's prediction performance for identifying at-risk students. These results are relevant for EDM researchers employing deep learning models.

Related papers

Ranking-Based At-Risk Student Prediction Using Federated Learning and Differential Features [4.21051987964486]
This study proposes a method that combines federated learning and differential features to address privacy concerns.<n>To evaluate the proposed method, a model for predicting at-risk students was trained using data from 1,136 students across 12 courses conducted over 4 years.<n>The trained models were also applicable for early prediction, achieving high performance in detecting at-risk students in earlier stages of the semester.
arXiv Detail & Related papers (2025-05-14T11:12:30Z)
Early Detection of At-Risk Students Using Machine Learning [0.0]
We aim to tackle the persistent challenges of higher education retention and student dropout rates by screening for at-risk students.<n>This work considers several machine learning models, including Support Vector Machines (SVM), Naive Bayes, K-nearest neighbors (KNN), Decision Trees, Logistic Regression, and Random Forest.<n>Our analysis indicates that all algorithms generate an acceptable outcome for at-risk student predictions, while Naive Bayes performs best overall.
arXiv Detail & Related papers (2024-12-12T17:33:06Z)
Accurate Multi-Category Student Performance Forecasting at Early Stages of Online Education Using Neural Networks [2.195766695109612]
This study introduces a novel neural network-based approach capable of accurately predicting student performance.<n>The proposed model predicts outcomes in Distinction, Fail, Pass, and Withdrawn categories.<n>The results indicate that the prediction accuracy of the proposed method is about 25% more than the existing state-of-the-art.
arXiv Detail & Related papers (2024-12-08T13:37:30Z)
RESTOR: Knowledge Recovery in Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can contain private or sensitive information.<n>Several machine unlearning algorithms have been proposed to eliminate the effect of such datapoints.<n>We propose the RESTOR framework for machine unlearning evaluation.
arXiv Detail & Related papers (2024-10-31T20:54:35Z)
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z)
A Predictive Model using Machine Learning Algorithm in Identifying Students Probability on Passing Semestral Course [0.0]
This study employs classification for data mining techniques, and decision tree for algorithm. With the utilization of the newly discovered predictive model, the prediction of students probabilities to pass the current courses they take gives 0.7619 accuracy, 0.8333 precision, 0.8823 recall, and 0.8571 f1 score.
arXiv Detail & Related papers (2023-04-12T01:57:08Z)
ASPEST: Bridging the Gap Between Active Learning and Selective Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z)
Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models. In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks. Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z)
Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced Dataset and Benchmark [62.997667081978825]
The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident. The dataset is created by aggregating publicly available datasets from the UK Department for Transport.
arXiv Detail & Related papers (2022-05-20T21:15:26Z)
Predicting Early Dropout: Calibration and Algorithmic Fairness Considerations [2.7048165023994057]
We develop a machine learning method to predict the risks of university dropout and underperformance. We analyze if this method leads to discriminatory outcomes for some sensitive groups in terms of prediction accuracy (AUC) and error rates (Generalized False Positive Rate, GFPR, or Generalized False Negative Rate, GFNR)
arXiv Detail & Related papers (2021-03-16T13:42:16Z)
Teaching deep learning causal effects improves predictive performance [18.861884489332894]
We describe a Causal-Temporal Structure for temporal EHR data; then based on this structure, we estimate sequential ITE along the timeline. We propose a knowledge-guided neural network methodology to incorporate estimated ITE.
arXiv Detail & Related papers (2020-11-11T00:01:14Z)
Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions [121.10450359856242]
Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data. Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods. We develop a frequentist alternative that: (a) does not interfere with model training or compromise its accuracy, (b) applies to any RNN architecture, and (c) provides theoretical coverage guarantees on the estimated uncertainty intervals.
arXiv Detail & Related papers (2020-06-20T22:45:32Z)
EPARS: Early Prediction of At-risk Students with Online and Offline Learning Behaviors [55.33024245762306]
Early prediction of students at risk (STAR) is an effective and significant means to provide timely intervention for dropout and suicide. Existing works mostly rely on either online or offline learning behaviors which are not comprehensive enough to capture the whole learning processes. We propose a novel algorithm (EPARS) that could early predict STAR in a semester by modeling online and offline learning behaviors.
arXiv Detail & Related papers (2020-06-06T12:56:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.