Related papers: Impact, Causation and Prediction of Socio-Academic and Economic Factors in Exam-centric Student Evaluation Measures using Machine Learning and Causal Analysis

Impact, Causation and Prediction of Socio-Academic and Economic Factors in Exam-centric Student Evaluation Measures using Machine Learning and Causal Analysis

URL: http://arxiv.org/abs/2506.12030v1
Date: Thu, 22 May 2025 17:41:05 GMT
Title: Impact, Causation and Prediction of Socio-Academic and Economic Factors in Exam-centric Student Evaluation Measures using Machine Learning and Causal Analysis
Authors: Md. Biplob Hosen, Sabbir Ahmed, Bushra Akter, Mehrin Anannya,
Abstract summary: socio-academic and economic factors influencing students' performance is crucial for effective educational interventions.<n>This study employs several machine learning techniques and causal analysis to predict and elucidate the impacts of these factors on academic performance.
Score: 1.124958340749622
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding socio-academic and economic factors influencing students' performance is crucial for effective educational interventions. This study employs several machine learning techniques and causal analysis to predict and elucidate the impacts of these factors on academic performance. We constructed a hypothetical causal graph and collected data from 1,050 student profiles. Following meticulous data cleaning and visualization, we analyze linear relationships through correlation and variable plots, and perform causal analysis on the hypothetical graph. Regression and classification models are applied for prediction, and unsupervised causality analysis using PC, GES, ICA-LiNGAM, and GRASP algorithms is conducted. Our regression analysis shows that Ridge Regression achieve a Mean Absolute Error (MAE) of 0.12 and a Mean Squared Error (MSE) of 0.024, indicating robustness, while classification models like Random Forest achieve nearly perfect F1-scores. The causal analysis shows significant direct and indirect effects of factors such as class attendance, study hours, and group study on CGPA. These insights are validated through unsupervised causality analysis. By integrating the best regression model into a web application, we are developing a practical tool for students and educators to enhance academic outcomes based on empirical evidence.

Related papers

Explainable AI and Machine Learning for Exam-based Student Evaluation: Causal and Predictive Analysis of Socio-academic and Economic Factors [1.2163458046014015]
Academic performance depends on a multivariable nexus of socio-academic and financial factors.<n>This study investigates these influences to develop effective strategies for optimizing students' CGPA.
arXiv Detail & Related papers (2025-08-01T17:09:49Z)
Do-PFN: In-Context Learning for Causal Effect Estimation [75.62771416172109]
We show that Prior-data fitted networks (PFNs) can be pre-trained on synthetic data to predict outcomes.<n>Our approach allows for the accurate estimation of causal effects without knowledge of the underlying causal graph.
arXiv Detail & Related papers (2025-06-06T12:43:57Z)
Data Fusion for Partial Identification of Causal Effects [62.56890808004615]
We propose a novel partial identification framework that enables researchers to answer key questions.<n>Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion?<n>We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance.
arXiv Detail & Related papers (2025-05-30T07:13:01Z)
Interpretable Credit Default Prediction with Ensemble Learning and SHAP [3.948008559977866]
This study focuses on the problem of credit default prediction, builds a modeling framework based on machine learning, and conducts comparative experiments on a variety of mainstream classification algorithms.<n>The results show that the ensemble learning method has obvious advantages in predictive performance, especially in dealing with complex nonlinear relationships between features and data imbalance problems.<n>The external credit score variable plays a dominant role in model decision making, which helps to improve the model's interpretability and practical application value.
arXiv Detail & Related papers (2025-05-27T07:23:22Z)
Machine learning algorithms to predict stroke in China based on causal inference of time series analysis [1.7646715816998508]
This study proposes a stroke risk prediction method that combines dynamic causal inference with machine learning models.<n>The research results indicate that dynamic causal inference features have important value in predicting stroke risk.
arXiv Detail & Related papers (2025-03-10T14:45:43Z)
Class-Dependent Perturbation Effects in Evaluating Time Series Attributions [5.136283512042341]
We show previously overlooked class-dependent effects in feature attribution metrics.<n>Our analysis suggests that perturbation-based evaluation may reflect specific model behaviors rather than intrinsic attribution quality.<n>We propose an evaluation framework with a class-aware penalty term to help assess and account for these effects.
arXiv Detail & Related papers (2025-02-24T10:22:03Z)
Analyzing Domestic Violence through Exploratory Data Analysis and Explainable Ensemble Learning Insights [0.5825410941577593]
This study explores male domestic violence (MDV) for the first time, highlighting the factors that influence it.<n>We collected data from nine major cities in Bangladesh and conducted exploratory data analysis (EDA) to understand the underlying dynamics.<n>EDA revealed patterns such as the high prevalence of verbal abuse, the influence of financial dependency, and the role of familial and socio-economic factors in MDV.
arXiv Detail & Related papers (2024-03-22T19:53:21Z)
Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes [72.13373216644021]
We study the societal impact of machine learning by considering the collection of models that are deployed in a given context. We find deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available. These examples demonstrate ecosystem-level analysis has unique strengths for characterizing the societal impact of machine learning.
arXiv Detail & Related papers (2023-07-12T01:11:52Z)
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator. This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z)
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models. We provide a language for describing how training data influences predictions, through a causal framework. Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z)
Auditing Fairness and Imputation Impact in Predictive Analytics for Higher Education [0.0]
There are two major barriers to the adoption of predictive analytics in higher education. The lack of democratization in deployment and the potential to exacerbate inequalities are cited.
arXiv Detail & Related papers (2021-09-13T05:08:40Z)
Double Robust Representation Learning for Counterfactual Prediction [68.78210173955001]
We propose a novel scalable method to learn double-robust representations for counterfactual predictions. We make robust and efficient counterfactual predictions for both individual and average treatment effects. The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.
arXiv Detail & Related papers (2020-10-15T16:39:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.