Systematic Ensemble Model Selection Approach for Educational Data Mining
- URL: http://arxiv.org/abs/2005.06647v1
- Date: Wed, 13 May 2020 22:25:58 GMT
- Title: Systematic Ensemble Model Selection Approach for Educational Data Mining
- Authors: MohammadNoor Injadat, Abdallah Moubayed, Ali Bou Nassif, Abdallah
Shami
- Abstract summary: This work explores and analyzing two different datasets at two separate stages of course delivery.
It proposes a systematic approach based on Gini index and p-value to select a suitable ensemble learner from a combination of six potential machine learning algorithms.
Experimental results show that the proposed ensemble models achieve high accuracy and low false positive rate at all stages for both datasets.
- Score: 8.26773636337474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A plethora of research has been done in the past focusing on predicting
student's performance in order to support their development. Many institutions
are focused on improving the performance and the education quality; and this
can be achieved by utilizing data mining techniques to analyze and predict
students' performance and to determine possible factors that may affect their
final marks. To address this issue, this work starts by thoroughly exploring
and analyzing two different datasets at two separate stages of course delivery
(20 percent and 50 percent respectively) using multiple graphical, statistical,
and quantitative techniques. The feature analysis provides insights into the
nature of the different features considered and helps in the choice of the
machine learning algorithms and their parameters. Furthermore, this work
proposes a systematic approach based on Gini index and p-value to select a
suitable ensemble learner from a combination of six potential machine learning
algorithms. Experimental results show that the proposed ensemble models achieve
high accuracy and low false positive rate at all stages for both datasets.
Related papers
- Improving prediction of students' performance in intelligent tutoring systems using attribute selection and ensembles of different multimodal data sources [0.0]
The aim of this study was to predict university students' learning performance using different sources of data from an Intelligent Tutoring System.
We collected and preprocessed data from 40 students from different multimodal sources.
arXiv Detail & Related papers (2024-02-10T09:31:39Z) - Few-Shot Learning on Graphs: from Meta-learning to Pre-training and
Prompting [56.25730255038747]
This survey endeavors to synthesize recent developments, provide comparative insights, and identify future directions.
We systematically categorize existing studies into three major families: meta-learning approaches, pre-training approaches, and hybrid approaches.
We analyze the relationships among these methods and compare their strengths and limitations.
arXiv Detail & Related papers (2024-02-02T14:32:42Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Analyzing the Capabilities of Nature-inspired Feature Selection
Algorithms in Predicting Student Performance [0.0]
In this paper, an analysis was conducted to determine the relative performance of a suite of nature-inspired algorithms in the feature-selection portion of ensemble algorithms used to predict student performance.
It was found that leveraging an ensemble approach using nature-inspired algorithms for feature selection and traditional ML algorithms for classification significantly increased predictive accuracy while also reducing feature set size by up to 65 percent.
arXiv Detail & Related papers (2023-08-15T21:18:52Z) - An Empirical Study on Distribution Shift Robustness From the Perspective
of Pre-Training and Data Augmentation [91.62129090006745]
This paper studies the distribution shift problem from the perspective of pre-training and data augmentation.
We provide the first comprehensive empirical study focusing on pre-training and data augmentation.
arXiv Detail & Related papers (2022-05-25T13:04:53Z) - Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or
Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms.
Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications.
By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z) - A Study of Graph-Based Approaches for Semi-Supervised Time Series
Classification [0.0]
Two main aspects are involved in this task: A suitable distance measure to evaluate the similarities between time series, and a learning method to make predictions based on these distances.
We describe four different distance measures, including (Soft) DTW and Matrix Profile, as well as four successful semi-supervised learning methods, including the graph Allen- Cahn method and a Graph Convolutional Neural Network.
Our findings show that all measures and methods vary strongly in accuracy between data sets and that there is no clear best combination to employ in all cases.
arXiv Detail & Related papers (2021-04-16T14:57:41Z) - Computational Models for Academic Performance Estimation [21.31653695065347]
This paper presents an in-depth analysis of deep learning and machine learning approaches for the formulation of an automated students' performance estimation system.
Our main contributions are (a) a large dataset with fifteen courses (shared publicly for academic research) (b) statistical analysis and ablations on the estimation problem for this dataset.
Unlike previous approaches that rely on feature engineering or logical function deduction, our approach is fully data-driven and thus highly generic with better performance across different prediction tasks.
arXiv Detail & Related papers (2020-09-06T07:31:37Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Multi-split Optimized Bagging Ensemble Model Selection for Multi-class
Educational Data Mining [8.26773636337474]
This work analyzes two different undergraduate datasets at two different universities.
It aims to predict the students' performance at two stages of course delivery (20% and 50% respectively)
arXiv Detail & Related papers (2020-06-09T03:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.