LLM-Augmented and Fair Machine Learning Framework for University Admission Prediction
- URL: http://arxiv.org/abs/2509.22560v1
- Date: Fri, 26 Sep 2025 16:40:44 GMT
- Title: LLM-Augmented and Fair Machine Learning Framework for University Admission Prediction
- Authors: Mohammad Abbadi, Yassine Himeur, Shadi Atalla, Dahlia Mansoor, Wathiq Mansoor,
- Abstract summary: This work presents a comprehensive framework that fuses machine learning, deep learning, and large language model techniques.<n>Drawing on more than 2,000 student records, the study benchmarks logistic regression, Naive Bayes, random forests, deep neural networks, and a stacked ensemble.<n>The framework is interpretable, fairness-aware, and deployable.
- Score: 3.54340329539693
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Universities face surging applications and heightened expectations for fairness, making accurate admission prediction increasingly vital. This work presents a comprehensive framework that fuses machine learning, deep learning, and large language model techniques to combine structured academic and demographic variables with unstructured text signals. Drawing on more than 2,000 student records, the study benchmarks logistic regression, Naive Bayes, random forests, deep neural networks, and a stacked ensemble. Logistic regression offers a strong, interpretable baseline at 89.5% accuracy, while the stacked ensemble achieves the best performance at 91.0%, with Naive Bayes and random forests close behind. To probe text integration, GPT-4-simulated evaluations of personal statements are added as features, yielding modest gains but demonstrating feasibility for authentic essays and recommendation letters. Transparency is ensured through feature-importance visualizations and fairness audits. The audits reveal a 9% gender gap (67% male vs. 76% female) and an 11% gap by parental education, underscoring the need for continued monitoring. The framework is interpretable, fairness-aware, and deployable.
Related papers
- Reliable and Reproducible Demographic Inference for Fairness in Face Analysis [63.46525489354455]
We propose a fully reproducible DAI pipeline that replaces conventional end-to-end training with a modular transfer learning approach.<n>We audit this pipeline across three dimensions: accuracy, fairness, and a newly introduced notion of robustness, defined via intra-identity consistency.<n>Our results show that the proposed method outperforms strong baselines, particularly on ethnicity, which is the more challenging attribute.
arXiv Detail & Related papers (2025-10-23T12:22:02Z) - VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual Augmentation [6.576811224645293]
Graph Neural Networks (GNNs) can learn structural and logical code relationships in a data-driven manner.<n>GNNs often learn'spurious' correlations from superficial code similarities.<n>We propose a unified framework for robust and interpretable vulnerability detection, called VISION.
arXiv Detail & Related papers (2025-08-26T11:20:39Z) - Beyond classical and contemporary models: a transformative AI framework for student dropout prediction in distance learning using RAG, Prompt engineering, and Cross-modal fusion [0.4369550829556578]
This paper introduces a transformative AI framework that redefines dropout prediction.<n>The framework achieves 89% accuracy and an F1-score of 0.88, outperforming conventional models by 7% and reducing false negatives by 21%.
arXiv Detail & Related papers (2025-07-04T21:41:43Z) - Demographic Attributes Prediction from Speech Using WavLM Embeddings [25.00298717665857]
This paper introduces a general classifier based on WavLM features, to infer demographic characteristics, such as age, gender, native language, education, and country, from speech.<n>The proposed framework identifies key acoustic and linguistic fea-tures associated with demographic attributes, achieving a Mean Absolute Error (MAE) of 4.94 for age prediction and over 99.81% accuracy for gender classification.
arXiv Detail & Related papers (2025-02-17T16:43:47Z) - VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning [59.68917139718813]
We show that a strong off-the-shelf frozen pretrained visual encoder can achieve state-of-the-art (SoTA) performance in forecasting and procedural planning.
By conditioning on frozen clip-level embeddings from observed steps to predict the actions of unseen steps, our prediction model is able to learn robust representations for forecasting.
arXiv Detail & Related papers (2024-10-04T14:52:09Z) - Learning to Love Edge Cases in Formative Math Assessment: Using the AMMORE Dataset and Chain-of-Thought Prompting to Improve Grading Accuracy [0.0]
This paper introduces AMMORE, a new dataset of 53,000 math open-response question-answer pairs from Rori.
We conduct two experiments to evaluate the use of large language models (LLM) for grading challenging student answers.
arXiv Detail & Related papers (2024-09-26T14:51:40Z) - Phrasing for UX: Enhancing Information Engagement through Computational Linguistics and Creative Analytics [0.0]
This study explores the relationship between textual features and Information Engagement (IE) on digital platforms.
It highlights the impact of computational linguistics and analytics on user interaction.
The READ model is introduced to quantify key predictors like representativeness, ease of use, affect, and distribution.
arXiv Detail & Related papers (2024-08-23T00:33:47Z) - BAL: Balancing Diversity and Novelty for Active Learning [53.289700543331925]
We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data.
Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
arXiv Detail & Related papers (2023-12-26T08:14:46Z) - Transferring Pre-trained Multimodal Representations with Cross-modal
Similarity Matching [49.730741713652435]
In this paper, we propose a method that can effectively transfer the representations of a large pre-trained multimodal model into a small target model.
For unsupervised transfer, we introduce cross-modal similarity matching (CSM) that enables a student model to learn the representations of a teacher model.
To better encode the text prompts, we design context-based prompt augmentation (CPA) that can alleviate the lexical ambiguity of input text prompts.
arXiv Detail & Related papers (2023-01-07T17:24:11Z) - Learning to Decompose Visual Features with Latent Textual Prompts [140.2117637223449]
We propose Decomposed Feature Prompting (DeFo) to improve vision-language models.
Our empirical study shows DeFo's significance in improving the vision-language models.
arXiv Detail & Related papers (2022-10-09T15:40:13Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - A Simple Framework for Contrastive Learning of Visual Representations [116.37752766922407]
This paper presents SimCLR: a simple framework for contrastive learning of visual representations.
We show that composition of data augmentations plays a critical role in defining effective predictive tasks.
We are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet.
arXiv Detail & Related papers (2020-02-13T18:50:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.