Using massive health insurance claims data to predict very high-cost
claimants: a machine learning approach
- URL: http://arxiv.org/abs/1912.13032v1
- Date: Mon, 30 Dec 2019 18:01:30 GMT
- Title: Using massive health insurance claims data to predict very high-cost
claimants: a machine learning approach
- Authors: Jos\'e M. Maisog and Wenhong Li and Yanchun Xu and Brian Hurley and
Hetal Shah and Ryan Lemberg and Tina Borden and Stephen Bandeian and Melissa
Schline and Roxanna Cross and Alan Spiro and Russ Michael and Alexander
Gutfraind
- Abstract summary: High-cost claimants (HiCCs) represent just 0.16% of the insured population but account for 9% of all healthcare costs.
We applied machine learning to train binary classification models to calculate the personal risk of HiCC.
Our results demonstrate that high-performing predictive models can be constructed using claims data and publicly available data alone.
- Score: 43.861384583351835
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Due to escalating healthcare costs, accurately predicting which patients will
incur high costs is an important task for payers and providers of healthcare.
High-cost claimants (HiCCs) are patients who have annual costs above
$\$250,000$ and who represent just 0.16% of the insured population but
currently account for 9% of all healthcare costs. In this study, we aimed to
develop a high-performance algorithm to predict HiCCs to inform a novel care
management system. Using health insurance claims from 48 million people and
augmented with census data, we applied machine learning to train binary
classification models to calculate the personal risk of HiCC. To train the
models, we developed a platform starting with 6,006 variables across all
clinical and demographic dimensions and constructed over one hundred candidate
models. The best model achieved an area under the receiver operating
characteristic curve of 91.2%. The model exceeds the highest published
performance (84%) and remains high for patients with no prior history of
high-cost status (89%), who have less than a full year of enrollment (87%), or
lack pharmacy claims data (88%). It attains an area under the precision-recall
curve of 23.1%, and precision of 74% at a threshold of 0.99. A care management
program enrolling 500 people with the highest HiCC risk is expected to treat
199 true HiCCs and generate a net savings of $\$7.3$ million per year. Our
results demonstrate that high-performing predictive models can be constructed
using claims data and publicly available data alone, even for rare high-cost
claimants exceeding $\$250,000$. Our model demonstrates the transformational
power of machine learning and artificial intelligence in care management, which
would allow healthcare payers and providers to introduce the next generation of
care management programs.
Related papers
- MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Building predictive models of healthcare costs with open healthcare data [0.0]
We present an approach to developing a predictive model using machine-learning techniques.
We analyzed de-identified patient data from New York StateS, consisting of 2.3 million records in 2016.
We built models to predict costs from patient diagnoses and demographics.
arXiv Detail & Related papers (2023-04-05T02:12:58Z) - Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging [47.99192239793597]
We evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training.
Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.
arXiv Detail & Related papers (2023-02-03T09:49:13Z) - Predicting Visit Cost of Obstructive Sleep Apnea using Electronic
Healthcare Records with Transformer [0.0]
Obstructive sleep apnea (OSA) is growing increasingly prevalent in many countries as obesity rises.
For treatment purposes, predicting OSA patients' visit expenses for the coming year is crucial.
Just a third of those data from OSA patients can be used to train analytics models.
arXiv Detail & Related papers (2023-01-28T20:08:00Z) - SANSformers: Self-Supervised Forecasting in Electronic Health Records
with Attention-Free Models [48.07469930813923]
This work aims to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities.
We introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data.
Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics.
arXiv Detail & Related papers (2021-08-31T08:23:56Z) - Ensemble model for pre-discharge icd10 coding prediction [45.82374977939355]
We propose an ensemble model incorporating multiple clinical data sources for accurate code predictions.
We obtain multi-label classification accuracies of 0.73 and 0.58 for average precision, 0.56 and 0.35 for F1-scores and 0.71 and 0.4 accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:02:56Z) - High-Throughput Approach to Modeling Healthcare Costs Using Electronic
Healthcare Records [5.354801701968199]
This study presents the results of a generalizable machine learning approach to predicting medical events built from 40 years of data from >860,000 patients pertaining to >6,700 prescription medications.
It was found that models built using this approach performed well when compared to similar studies predicting physician prescriptions of individual medications.
arXiv Detail & Related papers (2020-11-18T19:06:18Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Accurate and Interpretable Machine Learning for Transparent Pricing of
Health Insurance Plans [3.772148470078554]
Health insurance companies cover half of the United States population and pay 1.2 trillion US dollars every year to cover medical expenses for their members.
The actuary and underwriter roles at a health insurance company serve to assess which risks to take on and how to price those risks to ensure profitability of the organization.
We developed a sequence of two models, an individual patient-level and an employer-group-level model, to predict the annual per member per month allowed amount for employer groups.
Our models performed 20% better than the insurance carrier's existing pricing model, and identified 84% of the concession opportunities
arXiv Detail & Related papers (2020-09-23T08:07:33Z) - A unified machine learning approach to time series forecasting applied
to demand at emergency departments [1.7119367122421556]
There were 25.6 million attendances at Emergency Departments (EDs) in England in 2019 corresponding to an increase of 12 million attendances over the past ten years.
We develop a novel ensemble methodology that combines the outcomes of the best performing time series and machine learning approaches.
Our approach is able to predict attendances one day in advance up to a mean absolute error of +/- 14 and +/- 10 patients corresponding to a mean absolute percentage error of 6.8% and 8.6% respectively.
arXiv Detail & Related papers (2020-07-13T07:59:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.