Unsupervised Probabilistic Models for Sequential Electronic Health
Records
- URL: http://arxiv.org/abs/2204.07292v1
- Date: Fri, 15 Apr 2022 02:11:44 GMT
- Title: Unsupervised Probabilistic Models for Sequential Electronic Health
Records
- Authors: Alan D. Kaplan, John D. Greene, Vincent X. Liu, Priyadip Ray
- Abstract summary: The model consists of a layered set of latent variables that encode underlying structure in the data.
We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system.
The resulting properties of the trained model generate novel insight from these complex and multifaceted data.
- Score: 3.8015092217142223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop an unsupervised probabilistic model for heterogeneous Electronic
Health Record (EHR) data. Utilizing a mixture model formulation, our approach
directly models sequences of arbitrary length, such as medications and
laboratory results. This allows for subgrouping and incorporation of the
dynamics underlying heterogeneous data types. The model consists of a layered
set of latent variables that encode underlying structure in the data. These
variables represent subject subgroups at the top layer, and unobserved states
for sequences in the second layer. We train this model on episodic data from
subjects receiving medical care in the Kaiser Permanente Northern California
integrated healthcare delivery system. The resulting properties of the trained
model generate novel insight from these complex and multifaceted data. In
addition, we show how the model can be used to analyze sequences that
contribute to assessment of mortality likelihood.
Related papers
- Sequential Inference of Hospitalization Electronic Health Records Using Probabilistic Models [3.2988476179015005]
In this work we design a probabilistic unsupervised model for multiple arbitrary-length sequences contained in hospitalization Electronic Health Record (EHR) data.
The model uses a latent variable structure and captures complex relationships between medications, diagnoses, laboratory tests, neurological assessments, and medications.
Inference algorithms are derived that use partial data to infer properties of the complete sequences, including their length and presence of specific values.
arXiv Detail & Related papers (2024-03-27T21:06:26Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular
data [81.43750358586072]
We propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes.
We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets.
arXiv Detail & Related papers (2022-10-24T08:57:55Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - Unifying Epidemic Models with Mixtures [28.771032745045428]
The COVID-19 pandemic has emphasized the need for a robust understanding of epidemic models.
Here, we introduce a simple mixture-based model which bridges the two approaches.
Although the model is non-mechanistic, we show that it arises as the natural outcome of a process based on a networked SIR framework.
arXiv Detail & Related papers (2022-01-07T19:42:05Z) - Discrepancies in Epidemiological Modeling of Aggregated Heterogeneous
Data [1.433758865948252]
We show that state-of-the-art models for estimating epidemiological parameters, e.g.transmission rates, can be inappropriate when faced with complex systems.
We generate three complex outbreak scenarios by combining incidence curves from multiple epidemics.
We evaluate two data-generating models within this Bayesian inference framework.
arXiv Detail & Related papers (2021-06-20T03:41:19Z) - Harmonization with Flow-based Causal Inference [12.739380441313022]
This paper presents a normalizing-flow-based method to perform counterfactual inference upon a structural causal model (SCM) to harmonize medical data.
We evaluate on multiple, large, real-world medical datasets to observe that this method leads to better cross-domain generalization compared to state-of-the-art algorithms.
arXiv Detail & Related papers (2021-06-12T19:57:35Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Two-step penalised logistic regression for multi-omic data with an
application to cardiometabolic syndrome [62.997667081978825]
We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately.
Our approach should be preferred if the goal is to select as many relevant predictors as possible.
Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level.
arXiv Detail & Related papers (2020-08-01T10:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.