Benchmarking AutoML Frameworks for Disease Prediction Using Medical
Claims
- URL: http://arxiv.org/abs/2107.10495v1
- Date: Thu, 22 Jul 2021 07:34:48 GMT
- Title: Benchmarking AutoML Frameworks for Disease Prediction Using Medical
Claims
- Authors: Roland Albert A. Romero, Mariefel Nicole Y. Deypalan, Suchit Mehrotra,
John Titus Jungao, Natalie E. Sheils, Elisabetta Manduchi and Jason H. Moore
- Abstract summary: We generated a large dataset using historical administrative claims including demographic information and flags for disease codes.
We trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics.
All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high.
- Score: 7.219529711278771
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We ascertain and compare the performances of AutoML tools on large, highly
imbalanced healthcare datasets.
We generated a large dataset using historical administrative claims including
demographic information and flags for disease codes in four different time
windows prior to 2019. We then trained three AutoML tools on this dataset to
predict six different disease outcomes in 2019 and evaluated model performances
on several metrics.
The AutoML tools showed improvement from the baseline random forest model but
did not differ significantly from each other. All models recorded low area
under the precision-recall curve and failed to predict true positives while
keeping the true negative rate high. Model performance was not directly related
to prevalence. We provide a specific use-case to illustrate how to select a
threshold that gives the best balance between true and false positive rates, as
this is an important consideration in medical applications.
Healthcare datasets present several challenges for AutoML tools, including
large sample size, high imbalance, and limitations in the available features
types. Improvements in scalability, combinations of imbalance-learning
resampling and ensemble approaches, and curated feature selection are possible
next steps to achieve better performance.
Among the three explored, no AutoML tool consistently outperforms the rest in
terms of predictive performance. The performances of the models in this study
suggest that there may be room for improvement in handling medical claims data.
Finally, selection of the optimal prediction threshold should be guided by the
specific practical application.
Related papers
- LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science.
Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - A prediction and behavioural analysis of machine learning methods for
modelling travel mode choice [0.26249027950824505]
We conduct a systematic comparison of different modelling approaches, across multiple modelling problems, in terms of the key factors likely to affect model choice.
Results indicate that the models with the highest disaggregate predictive performance provide poorer estimates of behavioural indicators and aggregate mode shares.
It is also observed that the MNL model performs robustly in a variety of situations, though ML techniques can improve the estimates of behavioural indices such as Willingness to Pay.
arXiv Detail & Related papers (2023-01-11T11:10:32Z) - Robust self-healing prediction model for high dimensional data [0.685316573653194]
This work proposes a robust self healing (RSH) hybrid prediction model.
It functions by using the data in its entirety by removing errors and inconsistencies from it rather than discarding any data.
The proposed method is compared with some of the existing high performing models and the results are analyzed.
arXiv Detail & Related papers (2022-10-04T17:55:50Z) - Automatic Pharma News Categorization [0.0]
We use a text dataset consisting of 23 news categories relevant to pharma information science.
We compare the fine-tuning performance of multiple transformer models in a classification task.
We propose an ensemble model consisting of the top performing individual predictors.
arXiv Detail & Related papers (2021-12-28T08:42:16Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Curse of Small Sample Size in Forecasting of the Active Cases in
COVID-19 Outbreak [0.0]
During the COVID-19 pandemic, a massive number of attempts on the predictions of the number of cases and the other future trends of this pandemic have been made.
However, they fail to predict, in a reliable way, the medium and long term evolution of fundamental features of COVID-19 outbreak within acceptable accuracy.
This paper gives an explanation for the failure of machine learning models in this particular forecasting problem.
arXiv Detail & Related papers (2020-11-06T23:13:34Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.