Evaluating Model Performance in Medical Datasets Over Time
- URL: http://arxiv.org/abs/2305.13426v2
- Date: Sun, 16 Jul 2023 18:29:36 GMT
- Title: Evaluating Model Performance in Medical Datasets Over Time
- Authors: Helen Zhou, Yuwen Chen, Zachary C. Lipton
- Abstract summary: This work proposes the Evaluation on Medical datasets Over Time (EMDOT) framework.
Inspired by the concept of backtesting, EMDOT simulates possible training procedures that practitioners might have been able to execute at each point in time.
We show how depending on the dataset, using all historical data may be ideal in many cases, whereas using a window of the most recent data could be advantageous in others.
- Score: 26.471486383140526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) models deployed in healthcare systems must face data
drawn from continually evolving environments. However, researchers proposing
such models typically evaluate them in a time-agnostic manner, splitting
datasets according to patients sampled randomly throughout the entire study
time period. This work proposes the Evaluation on Medical Datasets Over Time
(EMDOT) framework, which evaluates the performance of a model class across
time. Inspired by the concept of backtesting, EMDOT simulates possible training
procedures that practitioners might have been able to execute at each point in
time and evaluates the resulting models on all future time points. Evaluating
both linear and more complex models on six distinct medical data sources
(tabular and imaging), we show how depending on the dataset, using all
historical data may be ideal in many cases, whereas using a window of the most
recent data could be advantageous in others. In datasets where models suffer
from sudden degradations in performance, we investigate plausible explanations
for these shocks. We release the EMDOT package to help facilitate further works
in deployment-oriented evaluation over time.
Related papers
- Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models [115.501751261878]
Fine-tuning language models(LMs) on human-generated data remains a prevalent practice.
We investigate whether we can go beyond human data on tasks where we have access to scalar feedback.
We find that ReST$EM$ scales favorably with model size and significantly surpasses fine-tuning only on human data.
arXiv Detail & Related papers (2023-12-11T18:17:43Z) - MADS: Modulated Auto-Decoding SIREN for time series imputation [9.673093148930874]
We propose MADS, a novel auto-decoding framework for time series imputation, built upon implicit neural representations.
We evaluate our model on two real-world datasets, and show that it outperforms state-of-the-art methods for time series imputation.
arXiv Detail & Related papers (2023-07-03T09:08:47Z) - Federated Learning of Medical Concepts Embedding using BEHRT [0.0]
We propose a federated learning approach for learning medical concepts embedding.
Our approach is based on embedding model like BEHRT, a deep neural sequence model for EHR.
We compare the performance of a model trained with FL against a model trained on centralized data.
arXiv Detail & Related papers (2023-05-22T14:05:39Z) - Can LMs Generalize to Future Data? An Empirical Analysis on Text
Summarization [50.20034493626049]
Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets.
Existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets.
We show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data.
arXiv Detail & Related papers (2023-05-03T08:08:07Z) - Model Evaluation in Medical Datasets Over Time [26.471486383140526]
We introduce the Evaluation on Medical datasets Over Time (EMDOT) framework and Python package, which evaluates the performance of a model class over time.
We compare two training strategies: (1) using all historical data, and (2) using a window of the most recent data.
We note changes in performance over time, and identify possible explanations for these shocks.
arXiv Detail & Related papers (2022-11-14T07:53:36Z) - Quantifying Quality of Class-Conditional Generative Models in
Time-Series Domain [4.219228636765818]
We introduce the InceptionTime Score (ITS) and the Frechet InceptionTime Distance (FITD) to gauge the qualitative performance of class conditional generative models on the time-series domain.
We conduct extensive experiments on 80 different datasets to study the discriminative capabilities of proposed metrics.
arXiv Detail & Related papers (2022-10-14T08:13:20Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Learning to be a Statistician: Learned Estimator for Number of Distinct
Values [54.629042119819744]
Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems.
In this work, we focus on how to derive accurate NDV estimations from random (online/offline) samples.
We propose to formulate the NDV estimation task in a supervised learning framework, and aim to learn a model as the estimator.
arXiv Detail & Related papers (2022-02-06T15:42:04Z) - A Real Use Case of Semi-Supervised Learning for Mammogram Classification
in a Local Clinic of Costa Rica [0.5541644538483946]
Training a deep learning model requires a considerable amount of labeled images.
A number of publicly available datasets have been built with data from different hospitals and clinics.
The use of the semi-supervised deep learning approach known as MixMatch, to leverage the usage of unlabeled data is proposed and evaluated.
arXiv Detail & Related papers (2021-07-24T22:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.