Related papers: Examining Imbalance Effects on Performance and Demographic Fairness of Clinical Language Models

Examining Imbalance Effects on Performance and Demographic Fairness of Clinical Language Models

URL: http://arxiv.org/abs/2412.17803v2
Date: Thu, 13 Feb 2025 21:28:17 GMT
Title: Examining Imbalance Effects on Performance and Demographic Fairness of Clinical Language Models
Authors: Precious Jones, Weisi Liu, I-Chan Huang, Xiaolei Huang,
Abstract summary: This study statistically probes the relationship between data imbalance and model performance in ICD code prediction.<n>We analyze imbalances in a standard benchmark data across gender, age, ethnicity, and social determinants of health by state-of-the-art biomedical language models.<n>Our study shows that data imbalance significantly impacts model performance and fairness, but feature similarity to the majority class may be a more critical factor.
Score: 4.390908825243365
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data imbalance is a fundamental challenge in applying language models to biomedical applications, particularly in ICD code prediction tasks where label and demographic distributions are uneven. While state-of-the-art language models have been increasingly adopted in biomedical tasks, few studies have systematically examined how data imbalance affects model performance and fairness across demographic groups. This study fills the gap by statistically probing the relationship between data imbalance and model performance in ICD code prediction. We analyze imbalances in a standard benchmark data across gender, age, ethnicity, and social determinants of health by state-of-the-art biomedical language models. By deploying diverse performance metrics and statistical analyses, we explore the influence of data imbalance on performance variations and demographic fairness. Our study shows that data imbalance significantly impacts model performance and fairness, but feature similarity to the majority class may be a more critical factor. We believe this study provides valuable insights for developing more equitable and robust language models in healthcare applications.

Related papers

Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases [2.9247093478805324]
Investigation reveals widespread sex- and age-related inequities in chronic disease datasets and their derived Machine Learning models. The analysis of data from over 25,000 individuals with chronic diseases revealed mild sex-related disparities, favoring predictive accuracy for males, and significant age-related differences, with better accuracy for younger patients. Notably, older patients showed inconsistent predictive accuracy across seven datasets, linked to higher data complexity and lower model performance.
arXiv Detail & Related papers (2024-12-27T07:31:14Z)
Using Backbone Foundation Model for Evaluating Fairness in Chest Radiography Without Demographic Data [2.7436483977171333]
This study aims to investigate the effectiveness of using the backbone of Foundation Models as an embedding extractor. We propose utilizing these groups in different stages of bias mitigation, including pre-processing, in-processing, and evaluation.
arXiv Detail & Related papers (2024-08-28T20:35:38Z)
Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give. We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z)
Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes [72.13373216644021]
We study the societal impact of machine learning by considering the collection of models that are deployed in a given context. We find deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available. These examples demonstrate ecosystem-level analysis has unique strengths for characterizing the societal impact of machine learning.
arXiv Detail & Related papers (2023-07-12T01:11:52Z)
Fairness in Machine Learning meets with Equity in Healthcare [6.842248432925292]
This study proposes an artificial intelligence framework for identifying and mitigating biases in data and models. A case study is presented to demonstrate how systematic biases in data can lead to amplified biases in model predictions. Future research aims to test and validate the proposed ML framework in real-world clinical settings to evaluate its impact on promoting health equity.
arXiv Detail & Related papers (2023-05-11T14:25:34Z)
Connecting Fairness in Machine Learning with Public Health Equity [0.0]
biases in data and model design can result in disparities for certain protected groups and amplify existing inequalities in healthcare. This study summarizes seminal literature on ML fairness and presents a framework for identifying and mitigating biases in the data and model. Case studies suggest how the framework can be used to prevent these biases and highlight the need for fair and equitable ML models in public health.
arXiv Detail & Related papers (2023-04-08T10:21:49Z)
Evaluating the Fairness of Deep Learning Uncertainty Estimates in Medical Image Analysis [3.5536769591744557]
Deep learning (DL) models have shown great success in many medical image analysis tasks. However, deployment of the resulting models into real clinical contexts requires robustness and fairness across different sub-populations. Recent studies have shown significant biases in DL models across demographic subgroups, indicating a lack of fairness in the models.
arXiv Detail & Related papers (2023-03-06T16:01:30Z)
Bias Reducing Multitask Learning on Mental Health Prediction [18.32551434711739]
There has been an increase in research in developing machine learning models for mental health detection or prediction. In this work, we aim to perform a fairness analysis and implement a multi-task learning based bias mitigation method on anxiety prediction models. Our analysis showed that our anxiety prediction base model introduced some bias with regards to age, income, ethnicity, and whether a participant is born in the U.S. or not.
arXiv Detail & Related papers (2022-08-07T02:28:32Z)
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models. We provide a language for describing how training data influences predictions, through a causal framework. Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z)
Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models [50.537859423741644]
Training a model on an imbalanced dataset can introduce unique challenges to the learning problem. We look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features.
arXiv Detail & Related papers (2022-04-04T09:38:38Z)
Counterfactual Representation Learning with Balancing Weights [74.67296491574318]
Key to causal inference with observational data is achieving balance in predictive features associated with each treatment type. Recent literature has explored representation learning to achieve this goal. We develop an algorithm for flexible, scalable and accurate estimation of causal effects.
arXiv Detail & Related papers (2020-10-23T19:06:03Z)
Double Robust Representation Learning for Counterfactual Prediction [68.78210173955001]
We propose a novel scalable method to learn double-robust representations for counterfactual predictions. We make robust and efficient counterfactual predictions for both individual and average treatment effects. The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.
arXiv Detail & Related papers (2020-10-15T16:39:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.