Datasheets for Healthcare AI: A Framework for Transparency and Bias Mitigation
- URL: http://arxiv.org/abs/2501.05617v1
- Date: Thu, 09 Jan 2025 23:36:34 GMT
- Title: Datasheets for Healthcare AI: A Framework for Transparency and Bias Mitigation
- Authors: Marjia Siddik, Harshvardhan J. Pandit,
- Abstract summary: Bias, data incompleteness, and inaccuracies in training datasets can lead to unfair outcomes and amplify existing disparities.
We propose a dataset documentation framework that promotes transparency and ensures alignment with regulatory requirements.
The findings emphasise the importance of dataset documentation in fostering responsible AI development.
- Score: 0.0
- License:
- Abstract: The use of AI in healthcare has the potential to improve patient care, optimize clinical workflows, and enhance decision-making. However, bias, data incompleteness, and inaccuracies in training datasets can lead to unfair outcomes and amplify existing disparities. This research investigates the current state of dataset documentation practices, focusing on their ability to address these challenges and support ethical AI development. We identify shortcomings in existing documentation methods, which limit the recognition and mitigation of bias, incompleteness, and other issues in datasets. We propose the 'Healthcare AI Datasheet' to address these gaps, a dataset documentation framework that promotes transparency and ensures alignment with regulatory requirements. Additionally, we demonstrate how it can be expressed in a machine-readable format, facilitating its integration with datasets and enabling automated risk assessments. The findings emphasise the importance of dataset documentation in fostering responsible AI development.
Related papers
- AI-Driven Healthcare: A Survey on Ensuring Fairness and Mitigating Bias [2.398440840890111]
AI applications have significantly improved diagnostic accuracy, treatment personalization, and patient outcome predictions.
These advancements also introduce substantial ethical and fairness challenges.
These biases can lead to disparities in healthcare delivery, affecting diagnostic accuracy and treatment outcomes across different demographic groups.
arXiv Detail & Related papers (2024-07-29T02:39:17Z) - TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [57.067409211231244]
This paper presents meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design.
We provide basic validation methods for each task to ensure the datasets' usability and reliability.
We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z) - Challenges for Responsible AI Design and Workflow Integration in Healthcare: A Case Study of Automatic Feeding Tube Qualification in Radiology [35.284458448940796]
Nasogastric tubes (NGTs) are feeding tubes that are inserted through the nose into the stomach to deliver nutrition or medication.
Recent AI developments demonstrate the feasibility of robustly detecting NGT placement from Chest X-ray images.
We present a human-centered approach to the problem and describe insights derived following contextual inquiry and in-depth interviews with 15 clinical stakeholders.
arXiv Detail & Related papers (2024-05-08T14:16:22Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline
Algorithm: Application to the ICU Length of Stay Prediction [65.268245109828]
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the ICU length of stay.
The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction.
The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
arXiv Detail & Related papers (2023-12-31T16:01:48Z) - Clairvoyance: A Pipeline Toolkit for Medical Time Series [95.22483029602921]
Time-series learning is the bread and butter of data-driven *clinical decision support*
Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a software toolkit.
Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.
arXiv Detail & Related papers (2023-10-28T12:08:03Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Collect, Measure, Repeat: Reliability Factors for Responsible AI Data
Collection [8.12993269922936]
We argue that data collection for AI should be performed in a responsible manner.
We propose a Responsible AI (RAI) methodology designed to guide the data collection with a set of metrics.
arXiv Detail & Related papers (2023-08-22T18:01:27Z) - Non-Imaging Medical Data Synthesis for Trustworthy AI: A Comprehensive
Survey [6.277848092408045]
Data quality is the key factor for the development of trustworthy AI in healthcare.
Access to good quality datasets is limited by the technical difficulty of data acquisition.
Large-scale sharing of healthcare data is hindered by strict ethical restrictions.
arXiv Detail & Related papers (2022-09-17T13:34:17Z) - Benchmark datasets driving artificial intelligence development fail to
capture the needs of medical professionals [4.799783526620609]
We released a catalogue of datasets and benchmarks pertaining to the broad domain of clinical and biomedical natural language processing (NLP)
A total of 450 NLP datasets were manually systematized and annotated with rich metadata.
Our analysis indicates that AI benchmarks of direct clinical relevance are scarce and fail to cover most work activities that clinicians want to see addressed.
arXiv Detail & Related papers (2022-01-18T15:05:28Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.