Predicting Early Dropout: Calibration and Algorithmic Fairness
Considerations
- URL: http://arxiv.org/abs/2103.09068v1
- Date: Tue, 16 Mar 2021 13:42:16 GMT
- Title: Predicting Early Dropout: Calibration and Algorithmic Fairness
Considerations
- Authors: Marzieh Karimi-Haghighi, Carlos Castillo, Davinia Hernandez-Leo,
Veronica Moreno Oliver
- Abstract summary: We develop a machine learning method to predict the risks of university dropout and underperformance.
We analyze if this method leads to discriminatory outcomes for some sensitive groups in terms of prediction accuracy (AUC) and error rates (Generalized False Positive Rate, GFPR, or Generalized False Negative Rate, GFNR)
- Score: 2.7048165023994057
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, the problem of predicting dropout risk in undergraduate studies
is addressed from a perspective of algorithmic fairness. We develop a machine
learning method to predict the risks of university dropout and
underperformance. The objective is to understand if such a system can identify
students at risk while avoiding potential discriminatory biases. When modeling
both risks, we obtain prediction models with an Area Under the ROC Curve (AUC)
of 0.77-0.78 based on the data available at the enrollment time, before the
first year of studies starts. This data includes the students' demographics,
the high school they attended, and their admission (average) grade. Our models
are calibrated: they produce estimated probabilities for each risk, not mere
scores. We analyze if this method leads to discriminatory outcomes for some
sensitive groups in terms of prediction accuracy (AUC) and error rates
(Generalized False Positive Rate, GFPR, or Generalized False Negative Rate,
GFNR). The models exhibit some equity in terms of AUC and GFNR along groups.
The similar GFNR means a similar probability of failing to detect risk for
students who drop out. The disparities in GFPR are addressed through a
mitigation process that does not affect the calibration of the model.
Related papers
- Mitigating Nonlinear Algorithmic Bias in Binary Classification [0.0]
This paper proposes the use of causal modeling to detect and mitigate bias that is nonlinear in the protected attribute.
We show that the probability of getting correctly classified as "low risk" is lowest among young people.
Based on the fitted causal model, the debiased- probability estimates are computed, showing improved fairness with little impact on overall accuracy.
arXiv Detail & Related papers (2023-12-09T01:26:22Z) - Distribution-free risk assessment of regression-based machine learning
algorithms [6.507711025292814]
We focus on regression algorithms and the risk-assessment task of computing the probability of the true label lying inside an interval defined around the model's prediction.
We solve the risk-assessment problem using the conformal prediction approach, which provides prediction intervals that are guaranteed to contain the true label with a given probability.
arXiv Detail & Related papers (2023-10-05T13:57:24Z) - Diagnosis Uncertain Models For Medical Risk Prediction [80.07192791931533]
We consider a patient risk model which has access to vital signs, lab values, and prior history but does not have access to a patient's diagnosis.
We show that such all-cause' risk models have good generalization across diagnoses but have a predictable failure mode.
We propose a fix for this problem by explicitly modeling the uncertainty in risk prediction coming from uncertainty in patient diagnoses.
arXiv Detail & Related papers (2023-06-29T23:36:04Z) - A Generalized Unbiased Risk Estimator for Learning with Augmented
Classes [70.20752731393938]
Given unlabeled data, an unbiased risk estimator (URE) can be derived, which can be minimized for LAC with theoretical guarantees.
We propose a generalized URE that can be equipped with arbitrary loss functions while maintaining the theoretical guarantees.
arXiv Detail & Related papers (2023-06-12T06:52:04Z) - Selecting Models based on the Risk of Damage Caused by Adversarial
Attacks [2.969705152497174]
Regulation, legal liabilities, and societal concerns challenge the adoption of AI in safety and security-critical applications.
One of the key concerns is that adversaries can cause harm by manipulating model predictions without being detected.
We propose a method to model and statistically estimate the probability of damage arising from adversarial attacks.
arXiv Detail & Related papers (2023-01-28T10:24:38Z) - Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance.
The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z) - Learning from a Biased Sample [3.546358664345473]
We propose a method for learning a decision rule that minimizes the worst-case risk incurred under a family of test distributions.
We empirically validate our proposed method in a case study on prediction of mental health scores from health survey data.
arXiv Detail & Related papers (2022-09-05T04:19:16Z) - A New Approach for Interpretability and Reliability in Clinical Risk
Prediction: Acute Coronary Syndrome Scenario [0.33927193323747895]
We intend to create a new risk assessment methodology that combines the best characteristics of both risk score and machine learning models.
The proposed approach achieved testing results identical to the standard LR, but offers superior interpretability and personalization.
The reliability estimation of individual predictions presented a great correlation with the misclassifications rate.
arXiv Detail & Related papers (2021-10-15T19:33:46Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized.
We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.