Evaluating Federated Learning for At-Risk Student Prediction: A Comparative Analysis of Model Complexity and Data Balancing
- URL: http://arxiv.org/abs/2508.18316v1
- Date: Sat, 23 Aug 2025 19:58:16 GMT
- Title: Evaluating Federated Learning for At-Risk Student Prediction: A Comparative Analysis of Model Complexity and Data Balancing
- Authors: Rodrigo Tertulino,
- Abstract summary: High dropout and failure rates in distance education pose a significant challenge for academic institutions.<n>This study develops and evaluates a machine learning model based on early academic performance and digital engagement patterns to predict student risk at a UK university.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High dropout and failure rates in distance education pose a significant challenge for academic institutions, making the proactive identification of at-risk students crucial for providing timely support. This study develops and evaluates a machine learning model based on early academic performance and digital engagement patterns from the large-scale OULAD dataset to predict student risk at a UK university. To address the practical challenges of data privacy and institutional silos that often hinder such initiatives, we implement the model using a Federated Learning (FL) framework. We compare model complexity (Logistic Regression vs. a Deep Neural Network) and data balancing. The final federated model demonstrates strong predictive capability, achieving an ROC AUC score of approximately 85% in identifying at-risk students. Our findings show that this federated approach provides a practical and scalable solution for institutions to build effective early-warning systems, enabling proactive student support while inherently respecting data privacy.
Related papers
- Centralized vs. Federated Learning for Educational Data Mining: A Comparative Study on Student Performance Prediction with SAEB Microdata [0.0]
The present study evaluates the feasibility and effectiveness of Federated Learning, specifically the FedProx algorithm, to predict student performance.<n>The analysis, conducted on a universe of over two million student records, revealed that the centralized model achieved an accuracy of 63.96%.<n>Remarkably, the federated model reached a peak accuracy of 61.23%, demonstrating a marginal performance loss in exchange for a robust privacy guarantee.
arXiv Detail & Related papers (2025-08-27T10:30:25Z) - Modelling higher education dropouts using sparse and interpretable post-clustering logistic regression [0.8437187555622164]
Higher education dropout constitutes a critical challenge for tertiary education systems worldwide.<n>The model introduced in this paper is a specialized form of logistic regression, specifically adapted to the context of university dropout analysis.
arXiv Detail & Related papers (2025-05-12T14:05:23Z) - Fairness and Robustness in Machine Unlearning [20.758637391023345]
We focus on fairness and robustness in machine unlearning algorithms.<n>Experiments demonstrate the vulnerability of current state-of-the-art approximated unlearning algorithms to adversarial attacks.<n>We demonstrate that unlearning in the intermediate and last layers is sufficient and cost-effective for time and memory complexity.
arXiv Detail & Related papers (2025-04-18T10:31:44Z) - AI-based identification and support of at-risk students: A case study of the Moroccan education system [5.199084419479099]
Student dropout is a global issue influenced by personal, familial, and academic factors.<n>This paper introduces an AI-driven predictive modeling approach to identify students at risk of dropping out.
arXiv Detail & Related papers (2025-04-09T13:30:35Z) - Privacy-Preserved Automated Scoring using Federated Learning for Educational Research [1.2556373621040728]
We propose a federated learning (FL) framework for automated scoring of educational assessments.<n>We benchmark our model against two state-of-the-art FL methods and a centralized learning baseline.<n>Results show that our model achieves the highest accuracy (94.5%) among FL approaches.
arXiv Detail & Related papers (2025-03-12T19:06:25Z) - Ten Challenging Problems in Federated Foundation Models [55.343738234307544]
Federated Foundation Models (FedFMs) represent a distributed learning paradigm that fuses general competences of foundation models as well as privacy-preserving capabilities of federated learning.<n>This paper provides a comprehensive summary of the ten challenging problems inherent in FedFMs, encompassing foundational theory, utilization of private data, continual learning, unlearning, Non-IID and graph data, bidirectional knowledge transfer, incentive mechanism design, game mechanism design, model watermarking, and efficiency.
arXiv Detail & Related papers (2025-02-14T04:01:15Z) - Feasible Learning [78.6167929413604]
We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bounds the loss for each training sample.<n>Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.
arXiv Detail & Related papers (2025-01-24T20:39:38Z) - Early Detection of At-Risk Students Using Machine Learning [0.0]
We aim to tackle the persistent challenges of higher education retention and student dropout rates by screening for at-risk students.<n>This work considers several machine learning models, including Support Vector Machines (SVM), Naive Bayes, K-nearest neighbors (KNN), Decision Trees, Logistic Regression, and Random Forest.<n>Our analysis indicates that all algorithms generate an acceptable outcome for at-risk student predictions, while Naive Bayes performs best overall.
arXiv Detail & Related papers (2024-12-12T17:33:06Z) - Fed-Credit: Robust Federated Learning with Credibility Management [18.349127735378048]
Federated Learning (FL) is an emerging machine learning approach enabling model training on decentralized devices or data sources.
We propose a robust FL approach based on the credibility management scheme, called Fed-Credit.
The results exhibit superior accuracy and resilience against adversarial attacks, all while maintaining comparatively low computational complexity.
arXiv Detail & Related papers (2024-05-20T03:35:13Z) - Towards Robust Federated Learning via Logits Calibration on Non-IID Data [49.286558007937856]
Federated learning (FL) is a privacy-preserving distributed management framework based on collaborative model training of distributed devices in edge networks.
Recent studies have shown that FL is vulnerable to adversarial examples, leading to a significant drop in its performance.
In this work, we adopt the adversarial training (AT) framework to improve the robustness of FL models against adversarial example (AE) attacks.
arXiv Detail & Related papers (2024-03-05T09:18:29Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model,
Data, and Training [109.9218185711916]
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind social media texts or reviews.
We propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training.
arXiv Detail & Related papers (2023-04-19T11:07:43Z) - Stabilizing and Improving Federated Learning with Non-IID Data and
Client Dropout [15.569507252445144]
Label distribution skew induced data heterogeniety has been shown to be a significant obstacle that limits the model performance in federated learning.
We propose a simple yet effective framework by introducing a prior-calibrated softmax function for computing the cross-entropy loss.
The improved model performance over existing baselines in the presence of non-IID data and client dropout is demonstrated.
arXiv Detail & Related papers (2023-03-11T05:17:59Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - On the Robustness of Random Forest Against Untargeted Data Poisoning: An
Ensemble-Based Approach [42.81632484264218]
In machine learning models, perturbations of fractions of the training set (poisoning) can seriously undermine the model accuracy.
This paper aims to implement a novel hash-based ensemble approach that protects random forest against untargeted, random poisoning attacks.
arXiv Detail & Related papers (2022-09-28T11:41:38Z) - RoFL: Attestable Robustness for Secure Federated Learning [59.63865074749391]
Federated Learning allows a large number of clients to train a joint model without the need to share their private data.
To ensure the confidentiality of the client updates, Federated Learning systems employ secure aggregation.
We present RoFL, a secure Federated Learning system that improves robustness against malicious clients.
arXiv Detail & Related papers (2021-07-07T15:42:49Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Towards Fair Federated Learning with Zero-Shot Data Augmentation [123.37082242750866]
Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data.
We propose a novel federated learning system that employs zero-shot data augmentation on under-represented data to mitigate statistical heterogeneity and encourage more uniform accuracy performance across clients in federated networks.
We study two variants of this scheme, Fed-ZDAC (federated learning with zero-shot data augmentation at the clients) and Fed-ZDAS (federated learning with zero-shot data augmentation at the server).
arXiv Detail & Related papers (2021-04-27T18:23:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.