Related papers: EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

URL: http://arxiv.org/abs/2307.02028v3
Date: Mon, 11 Dec 2023 18:36:13 GMT
Title: EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models
Authors: Michael Wornow, Rahul Thapa, Ethan Steinberg, Jason A. Fries, Nigam H. Shah
Abstract summary: We publish a new dataset, EHRSHOT, which contains deidentified structured data from the electronic health records (EHRs) of 6,739 patients from Stanford Medicine. Second, we publish the weights of CLMBR-T-base, a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. Third, we define 15 few-shot clinical prediction tasks, enabling evaluation of foundation models on benefits such as sample efficiency and task adaptation.
Score: 6.506937003687058
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While the general machine learning (ML) community has benefited from public datasets, tasks, and models, the progress of ML in healthcare has been hampered by a lack of such shared assets. The success of foundation models creates new challenges for healthcare ML by requiring access to shared pretrained models to validate performance benefits. We help address these challenges through three contributions. First, we publish a new dataset, EHRSHOT, which contains deidentified structured data from the electronic health records (EHRs) of 6,739 patients from Stanford Medicine. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and not restricted to ICU/ED patients. Second, we publish the weights of CLMBR-T-base, a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR. We provide an end-to-end pipeline for the community to validate and build upon its performance. Third, we define 15 few-shot clinical prediction tasks, enabling evaluation of foundation models on benefits such as sample efficiency and task adaptation. Our model and dataset are available via a research data use agreement from our website: https://ehrshot.stanford.edu. Code to reproduce our results are available at our Github repo: https://github.com/som-shahlab/ehrshot-benchmark

Related papers

PatientDx: Merging Large Language Models for Protecting Data-Privacy in Healthcare [2.1046377530356764]
Fine-tuning of Large Language Models (LLMs) has become the default practice for improving model performance on a given task. PatientDx is a framework of model merging that allows the design of effective LLMs for health-predictive tasks without requiring fine-tuning nor adaptation on patient data.
arXiv Detail & Related papers (2025-04-24T08:21:04Z)
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training [51.41246396610475]
This paper aims to predict performance in closed-book question answering (QA) without the help of external tools.<n>We conduct large-scale retrieval and semantic analysis across the pre-training corpora of 21 publicly available and 3 custom-trained large language models.<n>Building on these foundations, we propose Size-dependent Mutual Information (SMI), an information-theoretic metric that linearly correlates pre-training data characteristics.
arXiv Detail & Related papers (2025-02-06T13:23:53Z)
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs [20.08838976147805]
We present the first systematic evaluation of the effect of context length on modeling EHR data. We find that longer context models improve predictive performance. For clinical applications, however, model performance alone is insufficient.
arXiv Detail & Related papers (2024-12-09T21:58:27Z)
Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. Existing approaches require re-training models on different data subsets, which is computationally intensive. This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z)
EHRMamba: Towards Generalizable and Scalable Foundation Models for Electronic Health Records [4.540391547020466]
We introduce EHRMamba, a robust foundation model built on the Mamba architecture. We introduce a novel approach to Multitask Prompted Finetuning (MPF) for EHR data, which enables EHRMamba to simultaneously learn multiple clinical tasks in a single finetuning phase. Our evaluations on the MIMIC-IV dataset demonstrate that EHRMamba advances state-of-the-art performance across 6 major clinical tasks and excels in EHR forecasting, marking a significant leap forward in the field.
arXiv Detail & Related papers (2024-05-23T13:43:29Z)
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models [115.501751261878]
Fine-tuning language models(LMs) on human-generated data remains a prevalent practice. We investigate whether we can go beyond human data on tasks where we have access to scalar feedback. We find that ReST$EM$ scales favorably with model size and significantly surpasses fine-tuning only on human data.
arXiv Detail & Related papers (2023-12-11T18:17:43Z)
A Multi-Center Study on the Adaptability of a Shared Foundation Model for Electronic Health Records [2.83988353856908]
Foundation models hold promise for transforming AI in healthcare by providing modular components that are easily adaptable to downstream healthcare tasks. This study examined the adaptability of a recently released structured EHR foundation model ($FM_SM$) trained on longitudinal medical record data from 2.57M Stanford Medicine patients. Our findings show that adapting shared EHR foundation models across hospitals provides improved prediction performance at less cost.
arXiv Detail & Related papers (2023-11-20T01:58:27Z)
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z)
Federated Learning of Medical Concepts Embedding using BEHRT [0.0]
We propose a federated learning approach for learning medical concepts embedding. Our approach is based on embedding model like BEHRT, a deep neural sequence model for EHR. We compare the performance of a model trained with FL against a model trained on centralized data.
arXiv Detail & Related papers (2023-05-22T14:05:39Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
Adapting Pretrained Language Models for Solving Tabular Prediction Problems in the Electronic Health Record [0.0]
We pretrain a small DeBERTa model on a dataset consisting of MIMIC-III discharge summaries, clinical notes, radiology reports, and PubMed abstracts. We compare this model's performance with a DeBERTa model pre-trained on clinical texts from our institutional EHR (DeBERTa) and an XGBoost model.
arXiv Detail & Related papers (2023-03-27T05:34:19Z)
Unsupervised Pre-Training on Patient Population Graphs for Patient-Level Predictions [48.02011627390706]
Pre-training has shown success in different areas of machine learning, such as Computer Vision (CV), Natural Language Processing (NLP) and medical imaging. In this paper, we apply unsupervised pre-training to heterogeneous, multi-modal EHR data for patient outcome prediction. We find that our proposed graph based pre-training method helps in modeling the data at a population level.
arXiv Detail & Related papers (2022-03-23T17:59:45Z)
Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? [70.3631443249802]
We design a battery of approaches intended to recover Personal Health Information from a trained BERT. Specifically, we attempt to recover patient names and conditions with which they are associated. We find that simple probing methods are not able to meaningfully extract sensitive information from BERT trained over the MIMIC-III corpus of EHR.
arXiv Detail & Related papers (2021-04-15T20:40:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.