Supervised multi-specialist topic model with applications on large-scale
electronic health record data
- URL: http://arxiv.org/abs/2105.01238v1
- Date: Tue, 4 May 2021 01:27:11 GMT
- Title: Supervised multi-specialist topic model with applications on large-scale
electronic health record data
- Authors: Ziyang Song, Xavier Sumba Toral, Yixin Xu, Aihua Liu, Liming Guo,
Guido Powell, Aman Verma, David Buckeridge, Ariane Marelli, Yue Li
- Abstract summary: We present MixEHR-S to jointly infer specialist-disease topics from the EHR data.
For efficient inference, we developed a closed-form collapsed variational inference algorithm.
In three applications, MixEHR-S conferred clinically meaningful latent topics among the most predictive latent topics.
- Score: 3.322262654060203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivation: Electronic health record (EHR) data provides a new venue to
elucidate disease comorbidities and latent phenotypes for precision medicine.
To fully exploit its potential, a realistic data generative process of the EHR
data needs to be modelled. We present MixEHR-S to jointly infer
specialist-disease topics from the EHR data. As the key contribution, we model
the specialist assignments and ICD-coded diagnoses as the latent topics based
on patient's underlying disease topic mixture in a novel unified supervised
hierarchical Bayesian topic model. For efficient inference, we developed a
closed-form collapsed variational inference algorithm to learn the model
distributions of MixEHR-S. We applied MixEHR-S to two independent large-scale
EHR databases in Quebec with three targeted applications: (1) Congenital Heart
Disease (CHD) diagnostic prediction among 154,775 patients; (2) Chronic
obstructive pulmonary disease (COPD) diagnostic prediction among 73,791
patients; (3) future insulin treatment prediction among 78,712 patients
diagnosed with diabetes as a mean to assess the disease exacerbation. In all
three applications, MixEHR-S conferred clinically meaningful latent topics
among the most predictive latent topics and achieved superior target prediction
accuracy compared to the existing methods, providing opportunities for
prioritizing high-risk patients for healthcare services. MixEHR-S source code
and scripts of the experiments are freely available at
https://github.com/li-lab-mcgill/mixehrS
Related papers
- Recent Advances in Predictive Modeling with Electronic Health Records [71.19967863320647]
utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics.
Deep learning has demonstrated its superiority in various applications, including healthcare.
arXiv Detail & Related papers (2024-02-02T00:31:01Z) - MixEHR-SurG: a joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records [18.87817671852005]
We present a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard.
This leads to a highly interpretable survival topic model that can infer PheCode-specific phenotype topics associated with patient mortality.
arXiv Detail & Related papers (2023-12-20T22:13:45Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Predicting multiple sclerosis disease severity with multimodal deep
neural networks [10.599189568556508]
We describe a pilot effort to leverage structured EHR data, neuroimaging data and clinical notes to build a multi-modal deep learning framework to predict patient's MS disease severity.
The proposed pipeline demonstrates up to 25% increase in terms of the area under the Area Under the Receiver Operating Characteristic curve (AUROC) compared to models using single-modal data.
arXiv Detail & Related papers (2023-04-08T16:23:18Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - Integrated Convolutional and Recurrent Neural Networks for Health Risk
Prediction using Patient Journey Data with Many Missing Values [9.418011774179794]
This paper proposes a novel end-to-end approach to modeling EHR patient journey data with Integrated Convolutional and Recurrent Neural Networks.
Our model can capture both long- and short-term temporal patterns within each patient journey and effectively handle the high degree of missingness in EHR data without any imputation data generation.
arXiv Detail & Related papers (2022-11-11T07:36:18Z) - Multimodal spatiotemporal graph neural networks for improved prediction
of 30-day all-cause hospital readmission [4.609543591101764]
We propose a multimodal, modality-agnostic graph neural network (MM-STGNN) for prediction of 30-day all-cause hospital readmission.
MM-STGNN achieves AU of 0.79 on both primary and external datasets.
For subset populations of patients with heart and vascular disease, our model also outperforms baselines on predicting 30-day readmission.
arXiv Detail & Related papers (2022-04-14T05:50:07Z) - SANSformers: Self-Supervised Forecasting in Electronic Health Records
with Attention-Free Models [48.07469930813923]
This work aims to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities.
We introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data.
Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics.
arXiv Detail & Related papers (2021-08-31T08:23:56Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.