Multi-split Optimized Bagging Ensemble Model Selection for Multi-class
Educational Data Mining
- URL: http://arxiv.org/abs/2006.05031v1
- Date: Tue, 9 Jun 2020 03:22:33 GMT
- Title: Multi-split Optimized Bagging Ensemble Model Selection for Multi-class
Educational Data Mining
- Authors: MohammadNoor Injadat, Abdallah Moubayed, Ali Bou Nassif, Abdallah
Shami
- Abstract summary: This work analyzes two different undergraduate datasets at two different universities.
It aims to predict the students' performance at two stages of course delivery (20% and 50% respectively)
- Score: 8.26773636337474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting students' academic performance has been a research area of
interest in recent years with many institutions focusing on improving the
students' performance and the education quality. The analysis and prediction of
students' performance can be achieved using various data mining techniques.
Moreover, such techniques allow instructors to determine possible factors that
may affect the students' final marks. To that end, this work analyzes two
different undergraduate datasets at two different universities. Furthermore,
this work aims to predict the students' performance at two stages of course
delivery (20% and 50% respectively). This analysis allows for properly choosing
the appropriate machine learning algorithms to use as well as optimize the
algorithms' parameters. Furthermore, this work adopts a systematic multi-split
approach based on Gini index and p-value. This is done by optimizing a suitable
bagging ensemble learner that is built from any combination of six potential
base machine learning algorithms. It is shown through experimental results that
the posited bagging ensemble models achieve high accuracy for the target group
for both datasets.
Related papers
- Improving prediction of students' performance in intelligent tutoring systems using attribute selection and ensembles of different multimodal data sources [0.0]
The aim of this study was to predict university students' learning performance using different sources of data from an Intelligent Tutoring System.
We collected and preprocessed data from 40 students from different multimodal sources.
arXiv Detail & Related papers (2024-02-10T09:31:39Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Enhancing Multi-Objective Optimization through Machine Learning-Supported Multiphysics Simulation [1.6685829157403116]
This paper presents a methodological framework for training, self-optimising, and self-organising surrogate models.
We show that surrogate models can be trained on relatively small amounts of data to approximate the underlying simulations accurately.
arXiv Detail & Related papers (2023-09-22T20:52:50Z) - Analyzing the Capabilities of Nature-inspired Feature Selection
Algorithms in Predicting Student Performance [0.0]
In this paper, an analysis was conducted to determine the relative performance of a suite of nature-inspired algorithms in the feature-selection portion of ensemble algorithms used to predict student performance.
It was found that leveraging an ensemble approach using nature-inspired algorithms for feature selection and traditional ML algorithms for classification significantly increased predictive accuracy while also reducing feature set size by up to 65 percent.
arXiv Detail & Related papers (2023-08-15T21:18:52Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Generalization in portfolio-based algorithm selection [97.74604695303285]
We provide the first provable guarantees for portfolio-based algorithm selection.
We show that if the portfolio is large, overfitting is inevitable, even with an extremely simple algorithm selector.
arXiv Detail & Related papers (2020-12-24T16:33:17Z) - Computational Models for Academic Performance Estimation [21.31653695065347]
This paper presents an in-depth analysis of deep learning and machine learning approaches for the formulation of an automated students' performance estimation system.
Our main contributions are (a) a large dataset with fifteen courses (shared publicly for academic research) (b) statistical analysis and ablations on the estimation problem for this dataset.
Unlike previous approaches that rely on feature engineering or logical function deduction, our approach is fully data-driven and thus highly generic with better performance across different prediction tasks.
arXiv Detail & Related papers (2020-09-06T07:31:37Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z) - Systematic Ensemble Model Selection Approach for Educational Data Mining [8.26773636337474]
This work explores and analyzing two different datasets at two separate stages of course delivery.
It proposes a systematic approach based on Gini index and p-value to select a suitable ensemble learner from a combination of six potential machine learning algorithms.
Experimental results show that the proposed ensemble models achieve high accuracy and low false positive rate at all stages for both datasets.
arXiv Detail & Related papers (2020-05-13T22:25:58Z) - Three Approaches for Personalization with Applications to Federated
Learning [68.19709953755238]
We present a systematic learning-theoretic study of personalization.
We provide learning-theoretic guarantees and efficient algorithms for which we also demonstrate the performance.
All of our algorithms are model-agnostic and work for any hypothesis class.
arXiv Detail & Related papers (2020-02-25T01:36:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.