Cluster Model for parsimonious selection of variables and enhancing Students Employability Prediction
- URL: http://arxiv.org/abs/2407.16884v1
- Date: Wed, 5 Jun 2024 06:06:46 GMT
- Title: Cluster Model for parsimonious selection of variables and enhancing Students Employability Prediction
- Authors: Pooja Thakar, Anil Mehta, Manisha,
- Abstract summary: Data in education is generally very large, multidimensional and unbalanced in nature.
In this paper, Engineering and MCA (Masters in Computer Applications) students data is collected from various universities and institutes pan India.
A cluster based model is presented in this paper, which, when applied at preprocessing stage helps in parsimonious selection of variables.
- Score: 1.4610685586329806
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Educational Data Mining (EDM) is a promising field, where data mining is widely used for predicting students performance. One of the most prevalent and recent challenge that higher education faces today is making students skillfully employable. Institutions possess large volume of data; still they are unable to reveal knowledge and guide their students. Data in education is generally very large, multidimensional and unbalanced in nature. Process of extracting knowledge from such data has its own set of problems and is a very complicated task. In this paper, Engineering and MCA (Masters in Computer Applications) students data is collected from various universities and institutes pan India. The dataset is large, unbalanced and multidimensional in nature. A cluster based model is presented in this paper, which, when applied at preprocessing stage helps in parsimonious selection of variables and improves the performance of predictive algorithms. Hence, facilitate in better prediction of Students Employability.
Related papers
- DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback [62.235925602004535]
DataEnvGym is a testbed of teacher environments for data generation agents.
It frames data generation as a sequential decision-making task, involving an agent and a data generation engine.
Students are iteratively trained and evaluated on generated data, and their feedback is reported to the agent after each iteration.
arXiv Detail & Related papers (2024-10-08T17:20:37Z) - How big is Big Data? [0.18472148461613155]
We assess what it big means in the context of typical materials-science machine-learning problems.
We ask how models generalize to similar datasets and how high-quality datasets can be gathered from heterogenous sources.
We find that big data present unique challenges along very different aspects that should serve to motivate further work.
arXiv Detail & Related papers (2024-05-18T22:13:55Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - From Random to Informed Data Selection: A Diversity-Based Approach to
Optimize Human Annotation and Few-Shot Learning [38.30983556062276]
A major challenge in Natural Language Processing is obtaining annotated data for supervised learning.
Crowdsourcing introduces issues related to the annotator's experience, consistency, and biases.
This paper contributes an automatic and informed data selection architecture to build a small dataset for few-shot learning.
arXiv Detail & Related papers (2024-01-24T04:57:32Z) - Multi-granulariy Time-based Transformer for Knowledge Tracing [9.788039182463768]
We leverage students historical data, including their past test scores, to create a personalized model for each student.
We then use these models to predict their future performance on a given test.
arXiv Detail & Related papers (2023-04-11T14:46:38Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Can Population-based Engagement Improve Personalisation? A Novel Dataset
and Experiments [21.12546768556595]
VLE is a novel dataset that consists of content and video based features extracted from publicly available scientific video lectures.
Our experimental results indicate that the newly proposed VLE dataset leads to building context-agnostic engagement prediction models.
Experiments in combining the built model with a personalising algorithm show promising improvements in addressing the cold-start problem encountered in educational recommenders.
arXiv Detail & Related papers (2022-06-22T15:53:24Z) - Process-BERT: A Framework for Representation Learning on Educational
Process Data [68.8204255655161]
We propose a framework for learning representations of educational process data.
Our framework consists of a pre-training step that uses BERT-type objectives to learn representations from sequential process data.
We apply our framework to the 2019 nation's report card data mining competition dataset.
arXiv Detail & Related papers (2022-04-28T16:07:28Z) - Zero-shot meta-learning for small-scale data from human subjects [10.320654885121346]
We develop a framework to rapidly adapt to a new prediction task with limited training data for out-of-sample test data.
Our model learns the latent treatment effects of each intervention and, by design, can naturally handle multi-task predictions.
Our model has implications for improved generalization of small-size human studies to the wider population.
arXiv Detail & Related papers (2022-03-29T17:42:04Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Differentially Private Deep Learning with Smooth Sensitivity [144.31324628007403]
We study privacy concerns through the lens of differential privacy.
In this framework, privacy guarantees are generally obtained by perturbing models in such a way that specifics of data used to train the model are made ambiguous.
One of the most important techniques used in previous works involves an ensemble of teacher models, which return information to a student based on a noisy voting procedure.
In this work, we propose a novel voting mechanism with smooth sensitivity, which we call Immutable Noisy ArgMax, that, under certain conditions, can bear very large random noising from the teacher without affecting the useful information transferred to the student
arXiv Detail & Related papers (2020-03-01T15:38:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.