Centralized vs. Federated Learning for Educational Data Mining: A Comparative Study on Student Performance Prediction with SAEB Microdata
- URL: http://arxiv.org/abs/2509.00086v1
- Date: Wed, 27 Aug 2025 10:30:25 GMT
- Title: Centralized vs. Federated Learning for Educational Data Mining: A Comparative Study on Student Performance Prediction with SAEB Microdata
- Authors: Rodrigo Tertulino,
- Abstract summary: The present study evaluates the feasibility and effectiveness of Federated Learning, specifically the FedProx algorithm, to predict student performance.<n>The analysis, conducted on a universe of over two million student records, revealed that the centralized model achieved an accuracy of 63.96%.<n>Remarkably, the federated model reached a peak accuracy of 61.23%, demonstrating a marginal performance loss in exchange for a robust privacy guarantee.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The application of data mining and artificial intelligence in education offers unprecedented potential for personalizing learning and early identification of at-risk students. However, the practical use of these techniques faces a significant barrier in privacy legislation, such as Brazil's General Data Protection Law (LGPD), which restricts the centralization of sensitive student data. To resolve this challenge, privacy-preserving computational approaches are required. The present study evaluates the feasibility and effectiveness of Federated Learning, specifically the FedProx algorithm, to predict student performance using microdata from the Brazilian Basic Education Assessment System (SAEB). A Deep Neural Network (DNN) model was trained in a federated manner, simulating a scenario with 50 schools, and its performance was rigorously benchmarked against a centralized eXtreme Gradient Boosting (XGBoost) model. The analysis, conducted on a universe of over two million student records, revealed that the centralized model achieved an accuracy of 63.96%. Remarkably, the federated model reached a peak accuracy of 61.23%, demonstrating a marginal performance loss in exchange for a robust privacy guarantee. The results indicate that Federated Learning is a viable and effective solution for building collaborative predictive models in the Brazilian educational context, in alignment with the requirements of the LGPD.
Related papers
- Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization [72.20212909644017]
Deliberate Practice Policy Optimization (DPPO) is a metacognitive Metaloop'' training framework.<n>DPPO alternates between supervised fine-tuning (competence expansion) and reinforcement learning (skill refinement)<n> Empirically, training a vision-language embodied model with DPPO, referred to as Pelican-VL 1.0, yields a 20.3% performance improvement over the base model.<n>We are open-sourcing both the models and code, providing the first systematic framework that alleviates the data and resource bottleneck.
arXiv Detail & Related papers (2025-11-20T17:58:04Z) - Privacy-Preserving Personalization in Education: A Federated Recommender System for Student Performance Prediction [0.07161783472741748]
Digitalization of education presents opportunities for data-driven personalization, but it also introduces challenges to student data privacy.<n>A novel privacy-preserving recommender system is proposed and evaluated to address this critical issue using Federated Learning (FL)
arXiv Detail & Related papers (2025-09-03T11:28:57Z) - Evaluating Federated Learning for At-Risk Student Prediction: A Comparative Analysis of Model Complexity and Data Balancing [0.0]
High dropout and failure rates in distance education pose a significant challenge for academic institutions.<n>This study develops and evaluates a machine learning model based on early academic performance and digital engagement patterns to predict student risk at a UK university.
arXiv Detail & Related papers (2025-08-23T19:58:16Z) - Ranking-Based At-Risk Student Prediction Using Federated Learning and Differential Features [4.21051987964486]
This study proposes a method that combines federated learning and differential features to address privacy concerns.<n>To evaluate the proposed method, a model for predicting at-risk students was trained using data from 1,136 students across 12 courses conducted over 4 years.<n>The trained models were also applicable for early prediction, achieving high performance in detecting at-risk students in earlier stages of the semester.
arXiv Detail & Related papers (2025-05-14T11:12:30Z) - RiM: Record, Improve and Maintain Physical Well-being using Federated Learning [1.1510009152620668]
We introduce RiM: Record, Improve, and Maintain, a mobile application which incorporates a novel personalized machine learning framework.<n>Our approach involves pre-training a multilayer perceptron (MLP) model on a large-scale simulated dataset to generate personalized recommendations.<n>We employ federated learning to fine-tune the model using data from IISER Bhopal students, thereby ensuring its applicability in real-world scenarios.
arXiv Detail & Related papers (2025-05-09T19:08:39Z) - Privacy-Preserved Automated Scoring using Federated Learning for Educational Research [1.2556373621040728]
We propose a federated learning (FL) framework for automated scoring of educational assessments.<n>We benchmark our model against two state-of-the-art FL methods and a centralized learning baseline.<n>Results show that our model achieves the highest accuracy (94.5%) among FL approaches.
arXiv Detail & Related papers (2025-03-12T19:06:25Z) - Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset [92.99416966226724]
We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms.<n>We apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels.<n>Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance.
arXiv Detail & Related papers (2024-11-05T23:26:10Z) - Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - "You Can't Fix What You Can't Measure": Privately Measuring Demographic
Performance Disparities in Federated Learning [78.70083858195906]
We propose differentially private mechanisms to measure differences in performance across groups while protecting the privacy of group membership.
Our results show that, contrary to what prior work suggested, protecting privacy is not necessarily in conflict with identifying performance disparities of federated models.
arXiv Detail & Related papers (2022-06-24T09:46:43Z) - SF-PATE: Scalable, Fair, and Private Aggregation of Teacher Ensembles [50.90773979394264]
This paper studies a model that protects the privacy of individuals' sensitive information while also allowing it to learn non-discriminatory predictors.
A key characteristic of the proposed model is to enable the adoption of off-the-selves and non-private fair models to create a privacy-preserving and fair model.
arXiv Detail & Related papers (2022-04-11T14:42:54Z) - Decentralized Federated Learning Preserves Model and Data Privacy [77.454688257702]
We propose a fully decentralized approach, which allows to share knowledge between trained models.
Students are trained on the output of their teachers via synthetically generated input data.
The results show that an untrained student model, trained on the teachers output reaches comparable F1-scores as the teacher.
arXiv Detail & Related papers (2021-02-01T14:38:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.