Knowledge Graph Embedding with Electronic Health Records Data via Latent
Graphical Block Model
- URL: http://arxiv.org/abs/2305.19997v1
- Date: Wed, 31 May 2023 16:18:46 GMT
- Title: Knowledge Graph Embedding with Electronic Health Records Data via Latent
Graphical Block Model
- Authors: Junwei Lu, Jin Yin, Tianxi Cai
- Abstract summary: We propose to infer the conditional dependency structure among EHR features via a latent graphical block model (LGBM)
We establish the statistical rates of the proposed estimators and show the perfect recovery of the block structure.
- Score: 13.398292423857756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the increasing adoption of electronic health records (EHR), large
scale EHRs have become another rich data source for translational clinical
research. Despite its potential, deriving generalizable knowledge from EHR data
remains challenging. First, EHR data are generated as part of clinical care
with data elements too detailed and fragmented for research. Despite recent
progress in mapping EHR data to common ontology with hierarchical structures,
much development is still needed to enable automatic grouping of local EHR
codes to meaningful clinical concepts at a large scale. Second, the total
number of unique EHR features is large, imposing methodological challenges to
derive reproducible knowledge graph, especially when interest lies in
conditional dependency structure. Third, the detailed EHR data on a very large
patient cohort imposes additional computational challenge to deriving a
knowledge network. To overcome these challenges, we propose to infer the
conditional dependency structure among EHR features via a latent graphical
block model (LGBM). The LGBM has a two layer structure with the first providing
semantic embedding vector (SEV) representation for the EHR features and the
second overlaying a graphical block model on the latent SEVs. The block
structures on the graphical model also allows us to cluster synonymous features
in EHR. We propose to learn the LGBM efficiently, in both statistical and
computational sense, based on the empirical point mutual information matrix. We
establish the statistical rates of the proposed estimators and show the perfect
recovery of the block structure. Numerical results from simulation studies and
real EHR data analyses suggest that the proposed LGBM estimator performs well
in finite sample.
Related papers
- FedCVD: The First Real-World Federated Learning Benchmark on Cardiovascular Disease Data [52.55123685248105]
Cardiovascular diseases (CVDs) are currently the leading cause of death worldwide, highlighting the critical need for early diagnosis and treatment.
Machine learning (ML) methods can help diagnose CVDs early, but their performance relies on access to substantial data with high quality.
This paper presents the first real-world FL benchmark for cardiovascular disease detection, named FedCVD.
arXiv Detail & Related papers (2024-10-28T02:24:01Z) - Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Recent Advances in Predictive Modeling with Electronic Health Records [71.19967863320647]
utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics.
Deep learning has demonstrated its superiority in various applications, including healthcare.
arXiv Detail & Related papers (2024-02-02T00:31:01Z) - Inference of Dependency Knowledge Graph for Electronic Health Records [13.35941801610195]
We propose a framework for deriving a sparse knowledge graph based on the dynamic log-linear topic model.
Within this model, the KG embeddings are estimated by performing singular value decomposition on the empirical pointwise mutual information matrix.
We then establish entrywise normality for the KG low-rank estimator, enabling the recovery of sparse graph edges with controlled type I error.
arXiv Detail & Related papers (2023-12-25T04:45:36Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Toward Cohort Intelligence: A Universal Cohort Representation Learning
Framework for Electronic Health Record Analysis [15.137213823470544]
We propose a universal COhort Representation lEarning (CORE) framework to augment EHR utilization by leveraging the fine-grained cohort information among patients.
CORE is readily applicable to diverse backbone models, serving as a universal plug-in framework to infuse cohort information into healthcare methods for boosted performance.
arXiv Detail & Related papers (2023-04-10T09:12:37Z) - Modeling electronic health record data using a knowledge-graph-embedded
topic model [6.170782354287972]
We present KG-ETM, an end-to-end knowledge graph-based multimodal embedded topic model.
KG-ETM distills latent disease topics from EHR data by learning the embedding from the medical knowledge graphs.
Our model is also able to discover interpretable and accurate patient representations for patient stratification and drug recommendations.
arXiv Detail & Related papers (2022-06-03T07:58:17Z) - Generating Synthetic Mixed-type Longitudinal Electronic Health Records
for Artificial Intelligent Applications [9.374416143268892]
generative adversarial network (GAN) entitled EHR-M-GAN which synthesizes textitmixed-type timeseries EHR data.
We have validated EHR-M-GAN on three publicly-available intensive care unit databases with records from a total of 141,488 unique patients.
arXiv Detail & Related papers (2021-12-22T17:17:34Z) - Self-Supervised Graph Learning with Hyperbolic Embedding for Temporal
Health Event Prediction [13.24834156675212]
We propose a hyperbolic embedding method with information flow to pre-train medical code representations in a hierarchical structure.
We incorporate these pre-trained representations into a graph neural network to detect disease complications.
We present a new hierarchy-enhanced historical prediction proxy task in our self-supervised learning framework to fully utilize EHR data.
arXiv Detail & Related papers (2021-06-09T00:42:44Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Uncovering the structure of clinical EEG signals with self-supervised
learning [64.4754948595556]
Supervised learning paradigms are often limited by the amount of labeled data that is available.
This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG)
By extracting information from unlabeled data, it might be possible to reach competitive performance with deep neural networks.
arXiv Detail & Related papers (2020-07-31T14:34:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.