EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models
- URL: http://arxiv.org/abs/2307.02028v3
- Date: Mon, 11 Dec 2023 18:36:13 GMT
- Title: EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models
- Authors: Michael Wornow, Rahul Thapa, Ethan Steinberg, Jason A. Fries, Nigam H.
Shah
- Abstract summary: We publish a new dataset, EHRSHOT, which contains deidentified structured data from the electronic health records (EHRs) of 6,739 patients from Stanford Medicine.
Second, we publish the weights of CLMBR-T-base, a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients.
Third, we define 15 few-shot clinical prediction tasks, enabling evaluation of foundation models on benefits such as sample efficiency and task adaptation.
- Score: 6.506937003687058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While the general machine learning (ML) community has benefited from public
datasets, tasks, and models, the progress of ML in healthcare has been hampered
by a lack of such shared assets. The success of foundation models creates new
challenges for healthcare ML by requiring access to shared pretrained models to
validate performance benefits. We help address these challenges through three
contributions. First, we publish a new dataset, EHRSHOT, which contains
deidentified structured data from the electronic health records (EHRs) of 6,739
patients from Stanford Medicine. Unlike MIMIC-III/IV and other popular EHR
datasets, EHRSHOT is longitudinal and not restricted to ICU/ED patients.
Second, we publish the weights of CLMBR-T-base, a 141M parameter clinical
foundation model pretrained on the structured EHR data of 2.57M patients. We
are one of the first to fully release such a model for coded EHR data; in
contrast, most prior models released for clinical data (e.g. GatorTron,
ClinicalBERT) only work with unstructured text and cannot process the rich,
structured data within an EHR. We provide an end-to-end pipeline for the
community to validate and build upon its performance. Third, we define 15
few-shot clinical prediction tasks, enabling evaluation of foundation models on
benefits such as sample efficiency and task adaptation. Our model and dataset
are available via a research data use agreement from our website:
https://ehrshot.stanford.edu. Code to reproduce our results are available at
our Github repo: https://github.com/som-shahlab/ehrshot-benchmark
Related papers
- Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.
Existing approaches require re-training models on different data subsets, which is computationally intensive.
This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z) - Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models [115.501751261878]
Fine-tuning language models(LMs) on human-generated data remains a prevalent practice.
We investigate whether we can go beyond human data on tasks where we have access to scalar feedback.
We find that ReST$EM$ scales favorably with model size and significantly surpasses fine-tuning only on human data.
arXiv Detail & Related papers (2023-12-11T18:17:43Z) - A Multi-Center Study on the Adaptability of a Shared Foundation Model for Electronic Health Records [2.83988353856908]
Foundation models hold promise for transforming AI in healthcare by providing modular components that are easily adaptable to downstream healthcare tasks.
This study examined the adaptability of a recently released structured EHR foundation model ($FM_SM$) trained on longitudinal medical record data from 2.57M Stanford Medicine patients.
Our findings show that adapting shared EHR foundation models across hospitals provides improved prediction performance at less cost.
arXiv Detail & Related papers (2023-11-20T01:58:27Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Federated Learning of Medical Concepts Embedding using BEHRT [0.0]
We propose a federated learning approach for learning medical concepts embedding.
Our approach is based on embedding model like BEHRT, a deep neural sequence model for EHR.
We compare the performance of a model trained with FL against a model trained on centralized data.
arXiv Detail & Related papers (2023-05-22T14:05:39Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Adapting Pretrained Language Models for Solving Tabular Prediction
Problems in the Electronic Health Record [0.0]
We pretrain a small DeBERTa model on a dataset consisting of MIMIC-III discharge summaries, clinical notes, radiology reports, and PubMed abstracts.
We compare this model's performance with a DeBERTa model pre-trained on clinical texts from our institutional EHR (DeBERTa) and an XGBoost model.
arXiv Detail & Related papers (2023-03-27T05:34:19Z) - Unsupervised Pre-Training on Patient Population Graphs for Patient-Level
Predictions [48.02011627390706]
Pre-training has shown success in different areas of machine learning, such as Computer Vision (CV), Natural Language Processing (NLP) and medical imaging.
In this paper, we apply unsupervised pre-training to heterogeneous, multi-modal EHR data for patient outcome prediction.
We find that our proposed graph based pre-training method helps in modeling the data at a population level.
arXiv Detail & Related papers (2022-03-23T17:59:45Z) - Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? [70.3631443249802]
We design a battery of approaches intended to recover Personal Health Information from a trained BERT.
Specifically, we attempt to recover patient names and conditions with which they are associated.
We find that simple probing methods are not able to meaningfully extract sensitive information from BERT trained over the MIMIC-III corpus of EHR.
arXiv Detail & Related papers (2021-04-15T20:40:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.