A Multi-level Analysis of Factors Associated with Student Performance: A Machine Learning Approach to the SAEB Microdata
- URL: http://arxiv.org/abs/2510.22266v1
- Date: Sat, 25 Oct 2025 12:15:30 GMT
- Title: A Multi-level Analysis of Factors Associated with Student Performance: A Machine Learning Approach to the SAEB Microdata
- Authors: Rodrigo Tertulino, Ricardo Almeida,
- Abstract summary: This study introduces a multi-level machine learning approach to classify the proficiency of 9th-grade and high school students.<n>Our model integrates four data sources: student socioeconomic characteristics, teacher professional profiles, school indicators, and director management profiles.
- Score: 0.07161783472741748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identifying the factors that influence student performance in basic education is a central challenge for formulating effective public policies in Brazil. This study introduces a multi-level machine learning approach to classify the proficiency of 9th-grade and high school students using microdata from the System of Assessment of Basic Education (SAEB). Our model uniquely integrates four data sources: student socioeconomic characteristics, teacher professional profiles, school indicators, and director management profiles. A comparative analysis of four ensemble algorithms confirmed the superiority of a Random Forest model, which achieved 90.2% accuracy and an Area Under the Curve (AUC) of 96.7%. To move beyond prediction, we applied Explainable AI (XAI) using SHAP, which revealed that the school's average socioeconomic level is the most dominant predictor, demonstrating that systemic factors have a greater impact than individual characteristics in isolation. The primary conclusion is that academic performance is a systemic phenomenon deeply tied to the school's ecosystem. This study provides a data-driven, interpretable tool to inform policies aimed at promoting educational equity by addressing disparities between schools.
Related papers
- Reliable and Reproducible Demographic Inference for Fairness in Face Analysis [63.46525489354455]
We propose a fully reproducible DAI pipeline that replaces conventional end-to-end training with a modular transfer learning approach.<n>We audit this pipeline across three dimensions: accuracy, fairness, and a newly introduced notion of robustness, defined via intra-identity consistency.<n>Our results show that the proposed method outperforms strong baselines, particularly on ethnicity, which is the more challenging attribute.
arXiv Detail & Related papers (2025-10-23T12:22:02Z) - AI-Driven Personalized Learning: Predicting Academic Per-formance Through Leadership Personality Traits [0.0]
The study explores the potential of AI technologies in personalized learning.<n>It suggests the prediction of academic success through leadership personality traits and machine learning modelling.
arXiv Detail & Related papers (2025-10-22T18:47:30Z) - Explainable AI and Machine Learning for Exam-based Student Evaluation: Causal and Predictive Analysis of Socio-academic and Economic Factors [1.2163458046014015]
Academic performance depends on a multivariable nexus of socio-academic and financial factors.<n>This study investigates these influences to develop effective strategies for optimizing students' CGPA.
arXiv Detail & Related papers (2025-08-01T17:09:49Z) - AI-based identification and support of at-risk students: A case study of the Moroccan education system [5.199084419479099]
Student dropout is a global issue influenced by personal, familial, and academic factors.<n>This paper introduces an AI-driven predictive modeling approach to identify students at risk of dropping out.
arXiv Detail & Related papers (2025-04-09T13:30:35Z) - Understanding the Disparities in Mathematics Performance: An Interpretability-Based Examination [0.5266869303483376]
This study aims to unravel the complex factors contributing to educational disparities in Mathematics performance.<n>Students from lower socioeconomic backgrounds possess fewer books and demonstrate lower performance in Mathematics.<n>Gender also emerged as a determinant, with females contributing differently to performance levels across the spectrum.
arXiv Detail & Related papers (2025-01-29T00:44:01Z) - DASKT: A Dynamic Affect Simulation Method for Knowledge Tracing [51.665582274736785]
Knowledge Tracing (KT) predicts future performance by students' historical computation, and understanding students' affective states can enhance the effectiveness of KT.<n>We propose Affect Dynamic Knowledge Tracing (DASKT) to explore the impact of various student affective states on their knowledge states.<n>Our research highlights a promising avenue for future studies, focusing on achieving high interpretability and accuracy.
arXiv Detail & Related papers (2025-01-18T10:02:10Z) - Machine Learning Predicts Upper Secondary Education Dropout as Early as the End of Primary School [0.0]
This study expanded the modeling horizon by utilizing a 13-year longitudinal dataset, encompassing data from kindergarten to Grade 9.
Our methodology incorporated a comprehensive range of parameters, including students' academic and cognitive skills, motivation, behavior, well-being, and officially recorded dropout data.
The machine learning models developed in this study demonstrated notable classification ability, achieving a mean area under the curve (AUC) of 0.61 with data up to Grade 6 and an improved AUC of 0.65 with data up to Grade 9.
arXiv Detail & Related papers (2024-03-01T13:18:08Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes [72.13373216644021]
We study the societal impact of machine learning by considering the collection of models that are deployed in a given context.
We find deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available.
These examples demonstrate ecosystem-level analysis has unique strengths for characterizing the societal impact of machine learning.
arXiv Detail & Related papers (2023-07-12T01:11:52Z) - Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification [75.49600684537117]
Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness.
We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability.
Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
arXiv Detail & Related papers (2021-01-18T22:55:40Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.