Meta Transfer Learning for Early Success Prediction in MOOCs
- URL: http://arxiv.org/abs/2205.01064v1
- Date: Mon, 25 Apr 2022 09:19:57 GMT
- Title: Meta Transfer Learning for Early Success Prediction in MOOCs
- Authors: Vinitra Swamy, Mirko Marras, Tanja K\"aser
- Abstract summary: Early prediction of student success for targeted intervention is essential to ensure no student is left behind in a course.
There exists a large body of research in success prediction for MOOCs, focusing mainly on training models from scratch for individual courses.
We present three novel strategies for transfer: 1) pre-training a model on a large set of diverse courses, 2) leveraging the pre-trained model by including meta information about courses, and 3) fine-tuning the model on previous course iterations.
- Score: 6.43486245025846
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the increasing popularity of massive open online courses (MOOCs),
many suffer from high dropout and low success rates. Early prediction of
student success for targeted intervention is therefore essential to ensure no
student is left behind in a course. There exists a large body of research in
success prediction for MOOCs, focusing mainly on training models from scratch
for individual courses. This setting is impractical in early success prediction
as the performance of a student is only known at the end of the course. In this
paper, we aim to create early success prediction models that can be transferred
between MOOCs from different domains and topics. To do so, we present three
novel strategies for transfer: 1) pre-training a model on a large set of
diverse courses, 2) leveraging the pre-trained model by including meta
information about courses, and 3) fine-tuning the model on previous course
iterations. Our experiments on 26 MOOCs with over 145,000 combined enrollments
and millions of interactions show that models combining interaction data and
course information have comparable or better performance than models which have
access to previous iterations of the course. With these models, we aim to
effectively enable educators to warm-start their predictions for new and
ongoing courses.
Related papers
- Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - Annealed Winner-Takes-All for Motion Forecasting [48.200282332176094]
We show how an aWTA loss can be integrated with state-of-the-art motion forecasting models to enhance their performance.
Our approach can be easily incorporated into any trajectory prediction model normally trained using WTA.
arXiv Detail & Related papers (2024-09-17T13:26:17Z) - A Fair Post-Processing Method based on the MADD Metric for Predictive Student Models [1.055551340663609]
A new metric has been developed to evaluate algorithmic fairness in predictive student models.
In this paper, we develop a post-processing method that aims at improving the fairness while preserving the accuracy of relevant predictive models' results.
We experiment with our approach on the task of predicting student success in an online course, using both simulated and real-world educational data.
arXiv Detail & Related papers (2024-07-07T14:53:41Z) - Universal Semi-supervised Model Adaptation via Collaborative Consistency
Training [92.52892510093037]
We introduce a realistic and challenging domain adaptation problem called Universal Semi-supervised Model Adaptation (USMA)
We propose a collaborative consistency training framework that regularizes the prediction consistency between two models.
Experimental results demonstrate the effectiveness of our method on several benchmark datasets.
arXiv Detail & Related papers (2023-07-07T08:19:40Z) - Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey [66.18478838828231]
Multi-modal pre-trained big models have drawn more and more attention in recent years.
This paper introduces the background of multi-modal pre-training by reviewing the conventional deep, pre-training works in natural language process, computer vision, and speech.
Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network, and knowledge enhanced pre-training.
arXiv Detail & Related papers (2023-02-20T15:34:03Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Transferable Student Performance Modeling for Intelligent Tutoring
Systems [24.118429574890055]
We consider transfer learning techniques as a way to provide accurate performance predictions for new courses by leveraging log data from existing courses.
We evaluate the proposed techniques using student interaction sequence data from 5 different mathematics courses containing data from over 47,000 students in a real world large-scale ITS.
arXiv Detail & Related papers (2022-02-08T16:36:27Z) - Context-aware Non-linear and Neural Attentive Knowledge-based Models for
Grade Prediction [12.592903558338444]
Grade prediction for future courses not yet taken by students is important as it can help them and their advisers during the process of course selection.
One of the successful approaches for accurately predicting a student's grades in future courses is Cumulative Knowledge-based Regression Models (CKRM)
CKRM learns shallow linear models that predict a student's grades as the similarity between his/her knowledge state and the target course.
We propose context-aware non-linear and neural attentive models that can potentially better estimate a student's knowledge state from his/her prior course information.
arXiv Detail & Related papers (2020-03-09T20:20:48Z) - Academic Performance Estimation with Attention-based Graph Convolutional
Networks [17.985752744098267]
Given a student's past data, the task of student's performance prediction is to predict a student's grades in future courses.
Traditional methods for student's performance prediction usually neglect the underlying relationships between multiple courses.
We propose a novel attention-based graph convolutional networks model for student's performance prediction.
arXiv Detail & Related papers (2019-12-26T23:11:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.