FairCLIP: Harnessing Fairness in Vision-Language Learning
- URL: http://arxiv.org/abs/2403.19949v2
- Date: Fri, 5 Apr 2024 20:08:16 GMT
- Title: FairCLIP: Harnessing Fairness in Vision-Language Learning
- Authors: Yan Luo, Min Shi, Muhammad Osama Khan, Muhammad Muneeb Afzal, Hao Huang, Shuaihang Yuan, Yu Tian, Luo Song, Ava Kouhana, Tobias Elze, Yi Fang, Mengyu Wang,
- Abstract summary: We introduce the first fair vision-language medical dataset that provides detailed demographic attributes, groundtruth labels, and clinical notes.
As the first fair vision-language medical dataset of its kind, HarvardFairMed holds the potential to catalyze the development of machine learning models that are both aware and clinically effective.
- Score: 20.743027598445796
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Fairness is a critical concern in deep learning, especially in healthcare, where these models influence diagnoses and treatment decisions. Although fairness has been investigated in the vision-only domain, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets for studying fairness. To bridge this research gap, we introduce the first fair vision-language medical dataset Harvard-FairVLMed that provides detailed demographic attributes, ground-truth labels, and clinical notes to facilitate an in-depth examination of fairness within VL foundation models. Using Harvard-FairVLMed, we conduct a comprehensive fairness analysis of two widely-used VL models (CLIP and BLIP2), pre-trained on both natural and medical domains, across four different protected attributes. Our results highlight significant biases in all VL models, with Asian, Male, Non-Hispanic, and Spanish being the preferred subgroups across the protected attributes of race, gender, ethnicity, and language, respectively. In order to alleviate these biases, we propose FairCLIP, an optimal-transport-based approach that achieves a favorable trade-off between performance and fairness by reducing the Sinkhorn distance between the overall sample distribution and the distributions corresponding to each demographic group. As the first VL dataset of its kind, Harvard-FairVLMed holds the potential to catalyze advancements in the development of machine learning models that are both ethically aware and clinically effective. Our dataset and code are available at https://ophai.hms.harvard.edu/datasets/harvard-fairvlmed10k.
Related papers
- Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models [7.808926474503611]
We propose Fair-MoE, a model specifically designed to ensure both fairness and effectiveness.
Fair-MoE comprises two key components: textitthe Fairness-Oriented Mixture of Experts (FO-MoE) and textitthe Fairness-Oriented Loss (FOL)
arXiv Detail & Related papers (2025-02-10T01:45:26Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias [3.455189439319919]
We introduce Cross-Care, the first benchmark framework dedicated to assessing biases and real world knowledge in large language models (LLMs)
We evaluate how demographic biases embedded in pre-training corpora like $ThePile$ influence the outputs of LLMs.
Our results highlight substantial misalignment between LLM representation of disease prevalence and real disease prevalence rates across demographic subgroups.
arXiv Detail & Related papers (2024-05-09T02:33:14Z) - What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases [87.65903426052155]
We perform a large-scale transfer learning experiment aimed at discovering latent vision-language skills from data.
We show that generation tasks suffer from a length bias, suggesting benchmarks should balance tasks with varying output lengths.
We present a new dataset, OLIVE, which simulates user instructions in the wild and presents challenges dissimilar to all datasets we tested.
arXiv Detail & Related papers (2024-04-03T02:40:35Z) - Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline
Algorithm: Application to the ICU Length of Stay Prediction [65.268245109828]
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the ICU length of stay.
The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction.
The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
arXiv Detail & Related papers (2023-12-31T16:01:48Z) - FairSeg: A Large-Scale Medical Image Segmentation Dataset for Fairness Learning Using Segment Anything Model with Fair Error-Bound Scaling [14.483954095650887]
High-quality medical fairness datasets are needed to promote fairness learning research.
Existing medical fairness datasets are all for classification tasks, and no fairness datasets are available for medical segmentation.
We propose the first fairness dataset for medical segmentation named HarvardFairSeg with 10,000 subject samples.
arXiv Detail & Related papers (2023-11-03T18:44:21Z) - Harvard Glaucoma Fairness: A Retinal Nerve Disease Dataset for Fairness
Learning and Fair Identity Normalization [13.792327874980632]
We introduce Harvard Glaucoma Fairness (Harvard-GF), a dataset with both 2D and 3D data imaging and balanced racial groups for glaucoma detection.
Our FIN approach is compared with various the-state-of-the-art fairness learning methods with superior performance in the racial, gender, and fairness tasks.
We propose an equity-scaled performance measure, which can be flexibly used to compare all kinds of performance metrics in the context of fairness.
arXiv Detail & Related papers (2023-06-15T16:39:05Z) - Auditing Algorithmic Fairness in Machine Learning for Health with
Severity-Based LOGAN [70.76142503046782]
We propose supplementing machine learning-based (ML) healthcare tools for bias with SLOGAN, an automatic tool for capturing local biases in a clinical prediction task.
LOGAN adapts an existing tool, LOcal Group biAs detectioN, by contextualizing group bias detection in patient illness severity and past medical history.
On average, SLOGAN identifies larger fairness disparities in over 75% of patient groups than LOGAN while maintaining clustering quality.
arXiv Detail & Related papers (2022-11-16T08:04:12Z) - Fair Machine Learning in Healthcare: A Review [90.22219142430146]
We analyze the intersection of fairness in machine learning and healthcare disparities.
We provide a critical review of the associated fairness metrics from a machine learning standpoint.
We propose several new research directions that hold promise for developing ethical and equitable ML applications in healthcare.
arXiv Detail & Related papers (2022-06-29T04:32:10Z) - Fairness-aware Model-agnostic Positive and Unlabeled Learning [38.50536380390474]
We propose a fairness-aware Positive and Unlabeled Learning (PUL) method named FairPUL.
For binary classification over individuals from two populations, we aim to achieve similar true positive rates and false positive rates.
Our framework is proven to be statistically consistent in terms of both the classification error and the fairness metric.
arXiv Detail & Related papers (2022-06-19T08:04:23Z) - Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system.
Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model.
We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.