Healthsheet: Development of a Transparency Artifact for Health Datasets
- URL: http://arxiv.org/abs/2202.13028v1
- Date: Sat, 26 Feb 2022 01:05:55 GMT
- Title: Healthsheet: Development of a Transparency Artifact for Health Datasets
- Authors: Negar Rostamzadeh, Diana Mincu, Subhrajit Roy, Andrew Smart, Lauren
Wilcox, Mahima Pushkarna, Jessica Schrouff, Razvan Amironesei, Nyalleng
Moorosi, Katherine Heller
- Abstract summary: We introduce Healthsheet, a contextualized adaptation of the original questionnaire citegebru 2018datasheets for health-specific applications.
We work with three publicly-available healthcare datasets as our case studies.
- Score: 13.57051456780329
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) approaches have demonstrated promising results in a
wide range of healthcare applications. Data plays a crucial role in developing
ML-based healthcare systems that directly affect people's lives. Many of the
ethical issues surrounding the use of ML in healthcare stem from structural
inequalities underlying the way we collect, use, and handle data. Developing
guidelines to improve documentation practices regarding the creation, use, and
maintenance of ML healthcare datasets is therefore of critical importance. In
this work, we introduce Healthsheet, a contextualized adaptation of the
original datasheet questionnaire ~\cite{gebru2018datasheets} for
health-specific applications. Through a series of semi-structured interviews,
we adapt the datasheets for healthcare data documentation. As part of the
Healthsheet development process and to understand the obstacles researchers
face in creating datasheets, we worked with three publicly-available healthcare
datasets as our case studies, each with different types of structured data:
Electronic health Records (EHR), clinical trial study data, and
smartphone-based performance outcome measures. Our findings from the
interviewee study and case studies show 1) that datasheets should be
contextualized for healthcare, 2) that despite incentives to adopt
accountability practices such as datasheets, there is a lack of consistency in
the broader use of these practices 3) how the ML for health community views
datasheets and particularly \textit{Healthsheets} as diagnostic tool to surface
the limitations and strength of datasets and 4) the relative importance of
different fields in the datasheet to healthcare concerns.
Related papers
- A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry [2.1717945745027425]
Large Language Models (LLMs) have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation.
This comprehensive survey delineates the extensive application and requisite evaluation of LLMs within healthcare.
Our survey is structured to provide an in-depth analysis of LLM applications across clinical settings, medical text data processing, research, education, and public health awareness.
arXiv Detail & Related papers (2024-04-24T09:55:24Z) - The METRIC-framework for assessing data quality for trustworthy AI in
medicine: a systematic review [0.0]
Development of trustworthy AI is especially important in medicine.
We focus on the importance of data quality (training/test) in deep learning (DL)
We propose the METRIC-framework, a specialised data quality framework for medical training data.
arXiv Detail & Related papers (2024-02-21T09:15:46Z) - Clairvoyance: A Pipeline Toolkit for Medical Time Series [95.22483029602921]
Time-series learning is the bread and butter of data-driven *clinical decision support*
Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a software toolkit.
Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.
arXiv Detail & Related papers (2023-10-28T12:08:03Z) - An Analysis on Large Language Models in Healthcare: A Case Study of
BioBERT [0.0]
This paper conducts a comprehensive investigation into applying large language models, particularly on BioBERT, in healthcare.
The analysis outlines a systematic methodology for fine-tuning BioBERT to meet the unique needs of the healthcare domain.
The paper thoroughly examines ethical considerations, particularly patient privacy and data security.
arXiv Detail & Related papers (2023-10-11T08:16:35Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z) - Machine Learning for Multimodal Electronic Health Records-based
Research: Challenges and Perspectives [22.230972071321357]
Electronic Health Records contain rich information of patients' health history.
relying on structured data only might be insufficient in reflecting patients' comprehensive information.
An increasing number of studies seek to obtain more accurate results by incorporating unstructured free-text data as well.
arXiv Detail & Related papers (2021-11-09T01:19:11Z) - How to Leverage Multimodal EHR Data for Better Medical Predictions? [13.401754962583771]
The complexity of electronic health records ( EHR) data is a challenge for the application of deep learning.
In this paper, we first extract the accompanying clinical notes from EHR and propose a method to integrate these data.
The results on two medical prediction tasks show that our fused model with different data outperforms the state-of-the-art method.
arXiv Detail & Related papers (2021-10-29T13:26:05Z) - VBridge: Connecting the Dots Between Features, Explanations, and Data
for Healthcare Models [85.4333256782337]
VBridge is a visual analytics tool that seamlessly incorporates machine learning explanations into clinicians' decision-making workflow.
We identified three key challenges, including clinicians' unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence.
We demonstrated the effectiveness of VBridge through two case studies and expert interviews with four clinicians.
arXiv Detail & Related papers (2021-08-04T17:34:13Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.