Why Do Students Drop Out? University Dropout Prediction and Associated
Factor Analysis Using Machine Learning Techniques
- URL: http://arxiv.org/abs/2310.10987v1
- Date: Tue, 17 Oct 2023 04:20:00 GMT
- Title: Why Do Students Drop Out? University Dropout Prediction and Associated
Factor Analysis Using Machine Learning Techniques
- Authors: Sean Kim and Eliot Yoo and Samuel Kim
- Abstract summary: This study examined university dropout prediction using academic, demographic, socioeconomic, and macroeconomic data types.
The data type most influential to the model performance was found to be academic data.
Preliminary results indicate that a correlation does exist between data types and dropout status.
- Score: 0.5042480200195721
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graduation and dropout rates have always been a serious consideration for
educational institutions and students. High dropout rates negatively impact
both the lives of individual students and institutions. To address this
problem, this study examined university dropout prediction using academic,
demographic, socioeconomic, and macroeconomic data types. Additionally, we
performed associated factor analysis to analyze which type of data would be
most influential on the performance of machine learning models in predicting
graduation and dropout status. These features were used to train four binary
classifiers to determine if students would graduate or drop out. The overall
performance of the classifiers in predicting dropout status had an average
ROC-AUC score of 0.935. The data type most influential to the model performance
was found to be academic data, with the average ROC-AUC score dropping from
0.935 to 0.811 when excluding all academic-related features from the data set.
Preliminary results indicate that a correlation does exist between data types
and dropout status.
Related papers
- Improving On-Time Undergraduate Graduation Rate For Undergraduate Students Using Predictive Analytics [0.0]
The on-time graduation rate among universities in Puerto Rico is significantly lower than in the mainland United States.
This project aims to develop a predictive model that accurately detects students early in their academic pursuit at risk of not graduating on time.
arXiv Detail & Related papers (2024-05-02T22:33:42Z) - Temporal and Between-Group Variability in College Dropout Prediction [0.0]
This study provides a systematic evaluation of contributing factors and predictive performance of machine learning models.
We find dropout prediction at the end of the second year has a 20% higher AUC than at the time of enrollment in a Random Forest model.
Regarding variability across student groups, college GPA has more predictive value for students from traditionally disadvantaged backgrounds than their peers.
arXiv Detail & Related papers (2024-01-12T10:43:55Z) - Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals [91.59906995214209]
We propose a new evaluation method, Counterfactual Attentiveness Test (CAT)
CAT uses counterfactuals by replacing part of the input with its counterpart from a different example, expecting an attentive model to change its prediction.
We show that GPT3 becomes less attentive with an increased number of demonstrations, while its accuracy on the test data improves.
arXiv Detail & Related papers (2023-11-16T06:27:35Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Is Your Model "MADD"? A Novel Metric to Evaluate Algorithmic Fairness
for Predictive Student Models [0.0]
We propose a novel metric, the Model Absolute Density Distance (MADD), to analyze models' discriminatory behaviors.
We evaluate our approach on the common task of predicting student success in online courses, using several common predictive classification models.
arXiv Detail & Related papers (2023-05-24T16:55:49Z) - Evaluating Splitting Approaches in the Context of Student Dropout
Prediction [0.0]
We study strategies for splitting and using academic data in order to create training and testing sets.
The study indicates that a temporal splitting combined with a time-based selection of the students' incremental academic histories leads to the best strategy for the problem in question.
arXiv Detail & Related papers (2023-05-15T12:30:11Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Predicting Early Dropout: Calibration and Algorithmic Fairness
Considerations [2.7048165023994057]
We develop a machine learning method to predict the risks of university dropout and underperformance.
We analyze if this method leads to discriminatory outcomes for some sensitive groups in terms of prediction accuracy (AUC) and error rates (Generalized False Positive Rate, GFPR, or Generalized False Negative Rate, GFNR)
arXiv Detail & Related papers (2021-03-16T13:42:16Z) - Advanced Dropout: A Model-free Methodology for Bayesian Dropout
Optimization [62.8384110757689]
Overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs)
The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate.
We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets.
arXiv Detail & Related papers (2020-10-11T13:19:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.