Related papers: Healthsheet: Development of a Transparency Artifact for Health Datasets

Healthsheet: Development of a Transparency Artifact for Health Datasets

URL: http://arxiv.org/abs/2202.13028v1
Date: Sat, 26 Feb 2022 01:05:55 GMT
Title: Healthsheet: Development of a Transparency Artifact for Health Datasets
Authors: Negar Rostamzadeh, Diana Mincu, Subhrajit Roy, Andrew Smart, Lauren Wilcox, Mahima Pushkarna, Jessica Schrouff, Razvan Amironesei, Nyalleng Moorosi, Katherine Heller
Abstract summary: We introduce Healthsheet, a contextualized adaptation of the original questionnaire citegebru 2018datasheets for health-specific applications. We work with three publicly-available healthcare datasets as our case studies.
Score: 13.57051456780329
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning (ML) approaches have demonstrated promising results in a wide range of healthcare applications. Data plays a crucial role in developing ML-based healthcare systems that directly affect people's lives. Many of the ethical issues surrounding the use of ML in healthcare stem from structural inequalities underlying the way we collect, use, and handle data. Developing guidelines to improve documentation practices regarding the creation, use, and maintenance of ML healthcare datasets is therefore of critical importance. In this work, we introduce Healthsheet, a contextualized adaptation of the original datasheet questionnaire ~\cite{gebru2018datasheets} for health-specific applications. Through a series of semi-structured interviews, we adapt the datasheets for healthcare data documentation. As part of the Healthsheet development process and to understand the obstacles researchers face in creating datasheets, we worked with three publicly-available healthcare datasets as our case studies, each with different types of structured data: Electronic health Records (EHR), clinical trial study data, and smartphone-based performance outcome measures. Our findings from the interviewee study and case studies show 1) that datasheets should be contextualized for healthcare, 2) that despite incentives to adopt accountability practices such as datasheets, there is a lack of consistency in the broader use of these practices 3) how the ML for health community views datasheets and particularly \textit{Healthsheets} as diagnostic tool to surface the limitations and strength of datasets and 4) the relative importance of different fields in the datasheet to healthcare concerns.

Related papers

Datasheets for Healthcare AI: A Framework for Transparency and Bias Mitigation [0.0]
Bias, data incompleteness, and inaccuracies in training datasets can lead to unfair outcomes and amplify existing disparities. We propose a dataset documentation framework that promotes transparency and ensures alignment with regulatory requirements. The findings emphasise the importance of dataset documentation in fostering responsible AI development.
arXiv Detail & Related papers (2025-01-09T23:36:34Z)
Self-Supervised Learning for Graph-Structured Data in Healthcare Applications: A Comprehensive Review [1.5778541636044452]
This review aims to be a valuable resource for both researchers and practitioners looking to utilize self-supervised learning (SSL) for graph-structured data in healthcare. To the best of our knowledge, this work represents the first comprehensive review of the literature on SSL applied to graph data in healthcare.
arXiv Detail & Related papers (2024-11-28T10:51:12Z)
A Survey of Medical Vision-and-Language Applications and Their Techniques [48.268198631277315]
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data. Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied. We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics.
arXiv Detail & Related papers (2024-11-19T03:27:05Z)
Large Language Model Benchmarks in Medical Tasks [11.196196955468992]
This paper presents a survey of various benchmark datasets employed in medical large language models (LLMs) tasks. The survey categorizes the datasets by modality, discussing their significance, data structure, and impact on the development of LLMs. The paper emphasizes the need for datasets with a greater degree of language diversity, structured omics data, and innovative approaches to synthesis.
arXiv Detail & Related papers (2024-10-28T11:07:33Z)
FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection [83.54960238236548]
FEDMEKI not only preserves data privacy but also enhances the capability of medical foundation models. FEDMEKI allows medical foundation models to learn from a broader spectrum of medical knowledge without direct data exposure.
arXiv Detail & Related papers (2024-08-17T15:18:56Z)
When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications? [8.89829757177796]
We examine the effectiveness of vector representations from last hidden states of Large Language Models for medical diagnostics and prognostics. We focus on instruction-tuned LLMs in a zero-shot setting to represent abnormal physiological data and evaluate their utilities as feature extractors. Although findings suggest the raw data features still prevails in medical ML tasks, zero-shot LLM embeddings demonstrate competitive results.
arXiv Detail & Related papers (2024-08-15T03:56:40Z)
A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry [2.1717945745027425]
Large Language Models (LLMs) have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation. This comprehensive survey delineates the extensive application and requisite evaluation of LLMs within healthcare. Our survey is structured to provide an in-depth analysis of LLM applications across clinical settings, medical text data processing, research, education, and public health awareness.
arXiv Detail & Related papers (2024-04-24T09:55:24Z)
The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review [0.0]
Development of trustworthy AI is especially important in medicine. We focus on the importance of data quality (training/test) in deep learning (DL) We propose the METRIC-framework, a specialised data quality framework for medical training data.
arXiv Detail & Related papers (2024-02-21T09:15:46Z)
Clairvoyance: A Pipeline Toolkit for Medical Time Series [95.22483029602921]
Time-series learning is the bread and butter of data-driven *clinical decision support* Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a software toolkit. Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.
arXiv Detail & Related papers (2023-10-28T12:08:03Z)
Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports. We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM. We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z)
Development and validation of a natural language processing algorithm to pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain. We annotated a corpus of clinical documents according to 12 types of identifying entities. We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z)
How to Leverage Multimodal EHR Data for Better Medical Predictions? [13.401754962583771]
The complexity of electronic health records ( EHR) data is a challenge for the application of deep learning. In this paper, we first extract the accompanying clinical notes from EHR and propose a method to integrate these data. The results on two medical prediction tasks show that our fused model with different data outperforms the state-of-the-art method.
arXiv Detail & Related papers (2021-10-29T13:26:05Z)
VBridge: Connecting the Dots Between Features, Explanations, and Data for Healthcare Models [85.4333256782337]
VBridge is a visual analytics tool that seamlessly incorporates machine learning explanations into clinicians' decision-making workflow. We identified three key challenges, including clinicians' unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence. We demonstrated the effectiveness of VBridge through two case studies and expert interviews with four clinicians.
arXiv Detail & Related papers (2021-08-04T17:34:13Z)
MIMO: Mutual Integration of Patient Journey and Medical Ontology for Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.