Auditing Fairness and Imputation Impact in Predictive Analytics for
Higher Education
- URL: http://arxiv.org/abs/2109.07908v1
- Date: Mon, 13 Sep 2021 05:08:40 GMT
- Title: Auditing Fairness and Imputation Impact in Predictive Analytics for
Higher Education
- Authors: Hadis Anahideh, Nazanin Nezami, Denisa G`andara
- Abstract summary: There are two major barriers to the adoption of predictive analytics in higher education.
The lack of democratization in deployment and the potential to exacerbate inequalities are cited.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, colleges and universities use predictive analytics in a variety of
ways to increase student success rates. Despite the potentials for predictive
analytics, there exist two major barriers to their adoption in higher
education: (a) the lack of democratization in deployment, and (b) the potential
to exacerbate inequalities. Education researchers and policymakers encounter
numerous challenges in deploying predictive modeling in practice. These
challenges present in different steps of modeling including data preparation,
model development, and evaluation. Nevertheless, each of these steps can
introduce additional bias to the system if not appropriately performed. Most
large-scale and nationally representative education data sets suffer from a
significant number of incomplete responses from the research participants.
Missing Values are the frequent latent causes behind many data analysis
challenges. While many education-related studies addressed the challenges of
missing data, little is known about the impact of handling missing values on
the fairness of predictive outcomes in practice.
In this paper, we set out to first assess the disparities in predictive
modeling outcome for college-student success, then investigate the impact of
imputation techniques on the model performance and fairness using a
comprehensive set of common metrics. The comprehensive analysis of a real
large-scale education dataset reveals key insights on the modeling disparity
and how different imputation techniques fundamentally compare to one another in
terms of their impact on the fairness of the student-success predictive
outcome.
Related papers
- Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori.
In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty.
We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z) - Analyzing Domestic Violence through Exploratory Data Analysis and Explainable Ensemble Learning Insights [0.5825410941577593]
This study explores male domestic violence (MDV) for the first time, highlighting the factors that influence it.
We collected data from nine major cities in Bangladesh and conducted exploratory data analysis (EDA) to understand the underlying dynamics.
EDA revealed patterns such as the high prevalence of verbal abuse, the influence of financial dependency, and the role of familial and socio-economic factors in MDV.
arXiv Detail & Related papers (2024-03-22T19:53:21Z) - Survey on Imbalanced Data, Representation Learning and SEP Forecasting [0.9065034043031668]
Deep Learning methods have significantly advanced various data-driven tasks such as regression, classification, and forecasting.
Much of this progress has been predicated on the strong but often unrealistic assumption that training datasets are balanced with respect to the targets they contain.
This misalignment with real-world conditions, where data is frequently imbalanced, hampers the effectiveness of such models in practical applications.
We present deep learning works that step away from the balanced-data assumption, employing strategies like representation learning to better approximate real-world imbalances.
arXiv Detail & Related papers (2023-10-11T15:38:53Z) - Towards Better Modeling with Missing Data: A Contrastive Learning-based
Visual Analytics Perspective [7.577040836988683]
Missing data can pose a challenge for machine learning (ML) modeling.
Current approaches are categorized into feature imputation and label prediction.
This study proposes a Contrastive Learning framework to model observed data with missing values.
arXiv Detail & Related papers (2023-09-18T13:16:24Z) - Is Your Model "MADD"? A Novel Metric to Evaluate Algorithmic Fairness
for Predictive Student Models [0.0]
We propose a novel metric, the Model Absolute Density Distance (MADD), to analyze models' discriminatory behaviors.
We evaluate our approach on the common task of predicting student success in online courses, using several common predictive classification models.
arXiv Detail & Related papers (2023-05-24T16:55:49Z) - Training Data Influence Analysis and Estimation: A Survey [25.460140245596918]
We provide the first comprehensive survey of training data influence analysis and estimation.
We organize state-of-the-art influence analysis methods into a taxonomy.
We propose future research directions to make influence analysis more useful in practice.
arXiv Detail & Related papers (2022-12-09T00:32:46Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Data-Centric Epidemic Forecasting: A Survey [56.99209141838794]
This survey delves into various data-driven methodological and practical advancements.
We enumerate the large number of epidemiological datasets and novel data streams that are relevant to epidemic forecasting.
We also discuss experiences and challenges that arise in real-world deployment of these forecasting systems.
arXiv Detail & Related papers (2022-07-19T16:15:11Z) - On the Robustness of Pretraining and Self-Supervision for a Deep
Learning-based Analysis of Diabetic Retinopathy [70.71457102672545]
We compare the impact of different training procedures for diabetic retinopathy grading.
We investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions.
Our results indicate that models from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions.
arXiv Detail & Related papers (2021-06-25T08:32:45Z) - Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification [75.49600684537117]
Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness.
We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability.
Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
arXiv Detail & Related papers (2021-01-18T22:55:40Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.