Explainable AI for Predicting and Understanding Mathematics Achievement: A Cross-National Analysis of PISA 2018
- URL: http://arxiv.org/abs/2508.16747v1
- Date: Fri, 22 Aug 2025 19:02:15 GMT
- Title: Explainable AI for Predicting and Understanding Mathematics Achievement: A Cross-National Analysis of PISA 2018
- Authors: Liu Liu, Rui Dai,
- Abstract summary: This study applies explainable artificial intelligence (XAI) techniques to PISA 2018 data to predict math achievement.<n>We tested four models: Multiple Linear Regression (MLR), Random Forest (RF), CATBoost, and Artificial Neural Networks (ANN)<n>Key predictors included socio-economic status, study time, teacher motivation, and students' attitudes toward mathematics.
- Score: 6.208182583084874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the factors that shape students' mathematics performance is vital for designing effective educational policies. This study applies explainable artificial intelligence (XAI) techniques to PISA 2018 data to predict math achievement and identify key predictors across ten countries (67,329 students). We tested four models: Multiple Linear Regression (MLR), Random Forest (RF), CATBoost, and Artificial Neural Networks (ANN), using student, family, and school variables. Models were trained on 70% of the data (with 5-fold cross-validation) and tested on 30%, stratified by country. Performance was assessed with R^2 and Mean Absolute Error (MAE). To ensure interpretability, we used feature importance, SHAP values, and decision tree visualizations. Non-linear models, especially RF and ANN, outperformed MLR, with RF balancing accuracy and generalizability. Key predictors included socio-economic status, study time, teacher motivation, and students' attitudes toward mathematics, though their impact varied across countries. Visual diagnostics such as scatterplots of predicted vs actual scores showed RF and CATBoost aligned closely with actual performance. Findings highlight the non-linear and context-dependent nature of achievement and the value of XAI in educational research. This study uncovers cross-national patterns, informs equity-focused reforms, and supports the development of personalized learning strategies.
Related papers
- Explainable AI and Machine Learning for Exam-based Student Evaluation: Causal and Predictive Analysis of Socio-academic and Economic Factors [1.2163458046014015]
Academic performance depends on a multivariable nexus of socio-academic and financial factors.<n>This study investigates these influences to develop effective strategies for optimizing students' CGPA.
arXiv Detail & Related papers (2025-08-01T17:09:49Z) - AI-based identification and support of at-risk students: A case study of the Moroccan education system [5.199084419479099]
Student dropout is a global issue influenced by personal, familial, and academic factors.<n>This paper introduces an AI-driven predictive modeling approach to identify students at risk of dropping out.
arXiv Detail & Related papers (2025-04-09T13:30:35Z) - Understanding the Disparities in Mathematics Performance: An Interpretability-Based Examination [0.5266869303483376]
This study aims to unravel the complex factors contributing to educational disparities in Mathematics performance.<n>Students from lower socioeconomic backgrounds possess fewer books and demonstrate lower performance in Mathematics.<n>Gender also emerged as a determinant, with females contributing differently to performance levels across the spectrum.
arXiv Detail & Related papers (2025-01-29T00:44:01Z) - DASKT: A Dynamic Affect Simulation Method for Knowledge Tracing [51.665582274736785]
Knowledge Tracing (KT) predicts future performance by students' historical computation, and understanding students' affective states can enhance the effectiveness of KT.<n>We propose Affect Dynamic Knowledge Tracing (DASKT) to explore the impact of various student affective states on their knowledge states.<n>Our research highlights a promising avenue for future studies, focusing on achieving high interpretability and accuracy.
arXiv Detail & Related papers (2025-01-18T10:02:10Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Fairness meets Cross-Domain Learning: a new perspective on Models and
Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness.
We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks.
Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z) - Machine Learning Approach for Predicting Students Academic Performance
and Study Strategies based on their Motivation [0.0]
This research aims to develop machine learning models for students academic performance and study strategies prediction.
Key learning attributes (intrinsic, extrinsic, autonomy, relatedness, competence, and self-esteem) essential for students learning process were used in building the models.
arXiv Detail & Related papers (2022-10-15T04:09:05Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Interpretable Knowledge Tracing: Simple and Efficient Student Modeling
with Causal Relations [21.74631969428855]
Interpretable Knowledge Tracing (IKT) is a simple model that relies on three meaningful latent features.
IKT's prediction of future student performance is made using a Tree-Augmented Naive Bayes (TAN)
IKT has great potential for providing adaptive and personalized instructions with causal reasoning in real-world educational systems.
arXiv Detail & Related papers (2021-12-15T19:05:48Z) - Personalized Education in the AI Era: What to Expect Next? [76.37000521334585]
The objective of personalized learning is to design an effective knowledge acquisition track that matches the learner's strengths and bypasses her weaknesses to meet her desired goal.
In recent years, the boost of artificial intelligence (AI) and machine learning (ML) has unfolded novel perspectives to enhance personalized education.
arXiv Detail & Related papers (2021-01-19T12:23:32Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.