Related papers: On the Impact of Random Seeds on the Fairness of Clinical Classifiers

On the Impact of Random Seeds on the Fairness of Clinical Classifiers

URL: http://arxiv.org/abs/2104.06338v1
Date: Tue, 13 Apr 2021 16:30:39 GMT
Title: On the Impact of Random Seeds on the Fairness of Clinical Classifiers
Authors: Silvio Amir and Jan-Willem van de Meent and Byron C. Wallace
Abstract summary: We explore the implications of this phenomenon for model fairness across demographic groups in clinical prediction tasks over electronic health records. We also find that the small sample sizes inherent to looking at intersections of minority groups and somewhat rare conditions limit our ability to accurately estimate disparities.
Score: 27.71610203951057
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent work has shown that fine-tuning large networks is surprisingly sensitive to changes in random seed(s). We explore the implications of this phenomenon for model fairness across demographic groups in clinical prediction tasks over electronic health records (EHR) in MIMIC-III -- the standard dataset in clinical NLP research. Apparent subgroup performance varies substantially for seeds that yield similar overall performance, although there is no evidence of a trade-off between overall and subgroup performance. However, we also find that the small sample sizes inherent to looking at intersections of minority groups and somewhat rare conditions limit our ability to accurately estimate disparities. Further, we find that jointly optimizing for high overall performance and low disparities does not yield statistically significant improvements. Our results suggest that fairness work using MIMIC-III should carefully account for variations in apparent differences that may arise from stochasticity and small sample sizes.

Related papers

One Size Fits None: Rethinking Fairness in Medical AI [7.163867603298375]
Real-world medical datasets are often noisy, incomplete, and imbalanced.<n>Differences raise fairness concerns, particularly when they reinforce existing disadvantages for marginalized groups.
arXiv Detail & Related papers (2025-06-17T10:59:02Z)
How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance [64.1656365676171]
Group imbalance has been a known problem in empirical risk minimization. This paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance.
arXiv Detail & Related papers (2024-03-12T04:38:05Z)
GroupMixNorm Layer for Learning Fair Models [4.324785083027206]
This research proposes a novel in-processing based GroupMixNorm layer for mitigating bias from deep learning models. The proposed method improves upon several fairness metrics with minimal impact on overall accuracy.
arXiv Detail & Related papers (2023-12-19T09:04:26Z)
Auditing ICU Readmission Rates in an Clinical Database: An Analysis of Risk Factors and Clinical Outcomes [0.0]
This study presents a machine learning pipeline for clinical data classification in the context of a 30-day readmission problem. The fairness audit uncovers disparities in equal opportunity, predictive parity, false positive rate parity, and false negative rate parity criteria. The study suggests the need for collaborative efforts among researchers, policymakers, and practitioners to address bias and fairness in artificial intelligence (AI) systems.
arXiv Detail & Related papers (2023-04-12T17:09:38Z)
Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation. We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks. We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z)
Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem. We examine the performance of various debiasing methods across multiple tasks. We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z)
RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness [94.69774317059122]
We show that the effectiveness of the well celebrated Mixup can be further improved if instead of using it as the sole learning objective, it is utilized as an additional regularizer to the standard cross-entropy loss. This simple change not only provides much improved accuracy but also significantly improves the quality of the predictive uncertainty estimation of Mixup.
arXiv Detail & Related papers (2022-06-29T09:44:33Z)
How does overparametrization affect performance on minority groups? [39.54853544590893]
We show that over parameterization always improves minority group performance. In a setting in which the regression functions for the majority and minority groups are different, we show that over parameterization always improves minority group performance.
arXiv Detail & Related papers (2022-06-07T18:00:52Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
Self-Diagnosing GAN: Diagnosing Underrepresented Samples in Generative Adversarial Networks [5.754152248672317]
We propose a method to diagnose and emphasize underrepresented samples during training of a Generative Adversarial Networks (GAN) Based on the observation that the underrepresented samples have a high average discrepancy or high variability in discrepancy, we propose a method to emphasize those samples. Our experimental results demonstrate that the proposed method improves GAN performance on various datasets.
arXiv Detail & Related papers (2021-02-24T02:31:50Z)
An Investigation of Why Overparameterization Exacerbates Spurious Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior. We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.