Related papers: Using massive health insurance claims data to predict very high-cost claimants: a machine learning approach

Using massive health insurance claims data to predict very high-cost claimants: a machine learning approach

URL: http://arxiv.org/abs/1912.13032v1
Date: Mon, 30 Dec 2019 18:01:30 GMT
Title: Using massive health insurance claims data to predict very high-cost claimants: a machine learning approach
Authors: Jos\'e M. Maisog and Wenhong Li and Yanchun Xu and Brian Hurley and Hetal Shah and Ryan Lemberg and Tina Borden and Stephen Bandeian and Melissa Schline and Roxanna Cross and Alan Spiro and Russ Michael and Alexander Gutfraind
Abstract summary: High-cost claimants (HiCCs) represent just 0.16% of the insured population but account for 9% of all healthcare costs. We applied machine learning to train binary classification models to calculate the personal risk of HiCC. Our results demonstrate that high-performing predictive models can be constructed using claims data and publicly available data alone.
Score: 43.861384583351835
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Due to escalating healthcare costs, accurately predicting which patients will incur high costs is an important task for payers and providers of healthcare. High-cost claimants (HiCCs) are patients who have annual costs above $\$250,000$ and who represent just 0.16% of the insured population but currently account for 9% of all healthcare costs. In this study, we aimed to develop a high-performance algorithm to predict HiCCs to inform a novel care management system. Using health insurance claims from 48 million people and augmented with census data, we applied machine learning to train binary classification models to calculate the personal risk of HiCC. To train the models, we developed a platform starting with 6,006 variables across all clinical and demographic dimensions and constructed over one hundred candidate models. The best model achieved an area under the receiver operating characteristic curve of 91.2%. The model exceeds the highest published performance (84%) and remains high for patients with no prior history of high-cost status (89%), who have less than a full year of enrollment (87%), or lack pharmacy claims data (88%). It attains an area under the precision-recall curve of 23.1%, and precision of 74% at a threshold of 0.99. A care management program enrolling 500 people with the highest HiCC risk is expected to treat 199 true HiCCs and generate a net savings of $\$7.3$ million per year. Our results demonstrate that high-performing predictive models can be constructed using claims data and publicly available data alone, even for rare high-cost claimants exceeding $\$250,000$. Our model demonstrates the transformational power of machine learning and artificial intelligence in care management, which would allow healthcare payers and providers to introduce the next generation of care management programs.

Related papers

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning [58.78045864541539]
We introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM)<n>DeepRare generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning.<n>The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases.
arXiv Detail & Related papers (2025-06-25T13:42:26Z)
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z)
Building predictive models of healthcare costs with open healthcare data [0.0]
We present an approach to developing a predictive model using machine-learning techniques. We analyzed de-identified patient data from New York StateS, consisting of 2.3 million records in 2016. We built models to predict costs from patient diagnoses and demographics.
arXiv Detail & Related papers (2023-04-05T02:12:58Z)
Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging [47.99192239793597]
We evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training. Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.
arXiv Detail & Related papers (2023-02-03T09:49:13Z)
Predicting Visit Cost of Obstructive Sleep Apnea using Electronic Healthcare Records with Transformer [0.0]
Obstructive sleep apnea (OSA) is growing increasingly prevalent in many countries as obesity rises. For treatment purposes, predicting OSA patients' visit expenses for the coming year is crucial. Just a third of those data from OSA patients can be used to train analytics models.
arXiv Detail & Related papers (2023-01-28T20:08:00Z)
SANSformers: Self-Supervised Forecasting in Electronic Health Records with Attention-Free Models [48.07469930813923]
This work aims to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities. We introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data. Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics.
arXiv Detail & Related papers (2021-08-31T08:23:56Z)
Ensemble model for pre-discharge icd10 coding prediction [45.82374977939355]
We propose an ensemble model incorporating multiple clinical data sources for accurate code predictions. We obtain multi-label classification accuracies of 0.73 and 0.58 for average precision, 0.56 and 0.35 for F1-scores and 0.71 and 0.4 accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:02:56Z)
High-Throughput Approach to Modeling Healthcare Costs Using Electronic Healthcare Records [5.354801701968199]
This study presents the results of a generalizable machine learning approach to predicting medical events built from 40 years of data from >860,000 patients pertaining to >6,700 prescription medications. It was found that models built using this approach performed well when compared to similar studies predicting physician prescriptions of individual medications.
arXiv Detail & Related papers (2020-11-18T19:06:18Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
Accurate and Interpretable Machine Learning for Transparent Pricing of Health Insurance Plans [3.772148470078554]
Health insurance companies cover half of the United States population and pay 1.2 trillion US dollars every year to cover medical expenses for their members. The actuary and underwriter roles at a health insurance company serve to assess which risks to take on and how to price those risks to ensure profitability of the organization. We developed a sequence of two models, an individual patient-level and an employer-group-level model, to predict the annual per member per month allowed amount for employer groups. Our models performed 20% better than the insurance carrier's existing pricing model, and identified 84% of the concession opportunities
arXiv Detail & Related papers (2020-09-23T08:07:33Z)
CPAS: the UK's National Machine Learning-based Hospital Capacity Planning System for COVID-19 [111.69190108272133]
The coronavirus disease 2019 poses the threat of overwhelming healthcare systems with unprecedented demands for intensive care resources. We developed the COVID-19 Capacity Planning and Analysis System (CPAS) - a machine learning-based system for hospital resource planning. CPAS is one of the first machine learning-based systems to be deployed in hospitals on a national scale to address the COVID-19 pandemic.
arXiv Detail & Related papers (2020-07-27T19:39:13Z)
A unified machine learning approach to time series forecasting applied to demand at emergency departments [1.7119367122421556]
There were 25.6 million attendances at Emergency Departments (EDs) in England in 2019 corresponding to an increase of 12 million attendances over the past ten years. We develop a novel ensemble methodology that combines the outcomes of the best performing time series and machine learning approaches. Our approach is able to predict attendances one day in advance up to a mean absolute error of +/- 14 and +/- 10 patients corresponding to a mean absolute percentage error of 6.8% and 8.6% respectively.
arXiv Detail & Related papers (2020-07-13T07:59:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.