Inference of Dependency Knowledge Graph for Electronic Health Records
- URL: http://arxiv.org/abs/2312.15611v1
- Date: Mon, 25 Dec 2023 04:45:36 GMT
- Title: Inference of Dependency Knowledge Graph for Electronic Health Records
- Authors: Zhiwei Xu, Ziming Gan, Doudou Zhou, Shuting Shen, Junwei Lu, Tianxi
Cai
- Abstract summary: We propose a framework for deriving a sparse knowledge graph based on the dynamic log-linear topic model.
Within this model, the KG embeddings are estimated by performing singular value decomposition on the empirical pointwise mutual information matrix.
We then establish entrywise normality for the KG low-rank estimator, enabling the recovery of sparse graph edges with controlled type I error.
- Score: 13.35941801610195
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The effective analysis of high-dimensional Electronic Health Record (EHR)
data, with substantial potential for healthcare research, presents notable
methodological challenges. Employing predictive modeling guided by a knowledge
graph (KG), which enables efficient feature selection, can enhance both
statistical efficiency and interpretability. While various methods have emerged
for constructing KGs, existing techniques often lack statistical certainty
concerning the presence of links between entities, especially in scenarios
where the utilization of patient-level EHR data is limited due to privacy
concerns. In this paper, we propose the first inferential framework for
deriving a sparse KG with statistical guarantee based on the dynamic log-linear
topic model proposed by \cite{arora2016latent}. Within this model, the KG
embeddings are estimated by performing singular value decomposition on the
empirical pointwise mutual information matrix, offering a scalable solution. We
then establish entrywise asymptotic normality for the KG low-rank estimator,
enabling the recovery of sparse graph edges with controlled type I error. Our
work uniquely addresses the under-explored domain of statistical inference
about non-linear statistics under the low-rank temporal dependent models, a
critical gap in existing research. We validate our approach through extensive
simulation studies and then apply the method to real-world EHR data in
constructing clinical KGs and generating clinical feature embeddings.
Related papers
- Addressing Data Heterogeneity in Federated Learning of Cox Proportional Hazards Models [8.798959872821962]
This paper outlines an approach in the domain of federated survival analysis, specifically the Cox Proportional Hazards (CoxPH) model.
We present an FL approach that employs feature-based clustering to enhance model accuracy across synthetic datasets and real-world applications.
arXiv Detail & Related papers (2024-07-20T18:34:20Z) - Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subtyping based on EHR Data [42.96821770394798]
TACCO is a novel framework that jointly discovers clusters of clinical concepts and patient visits based on a hypergraph modeling of EHR data.
We conduct experiments on the public MIMIC-III dataset and Emory internal CRADLE dataset over the downstream clinical tasks of phenotype classification and cardiovascular risk prediction.
In-depth model analysis, clustering results analysis, and clinical case studies further validate the improved utilities and insightful interpretations delivered by TACCO.
arXiv Detail & Related papers (2024-06-14T14:18:38Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Knowledge Graph Embedding with Electronic Health Records Data via Latent
Graphical Block Model [13.398292423857756]
We propose to infer the conditional dependency structure among EHR features via a latent graphical block model (LGBM)
We establish the statistical rates of the proposed estimators and show the perfect recovery of the block structure.
arXiv Detail & Related papers (2023-05-31T16:18:46Z) - Causal Inference via Nonlinear Variable Decorrelation for Healthcare
Applications [60.26261850082012]
We introduce a novel method with a variable decorrelation regularizer to handle both linear and nonlinear confounding.
We employ association rules as new representations using association rule mining based on the original features to increase model interpretability.
arXiv Detail & Related papers (2022-09-29T17:44:14Z) - Tree-Guided Rare Feature Selection and Logic Aggregation with Electronic
Health Records Data [7.422597776308963]
We propose a tree-guided feature selection and logic aggregation approach for large-scale regression with rare binary features.
In a suicide risk study with EHR data, our approach is able to select and aggregate prior mental health diagnoses.
arXiv Detail & Related papers (2022-06-18T03:52:43Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Statistical control for spatio-temporal MEG/EEG source imaging with
desparsified multi-task Lasso [102.84915019938413]
Non-invasive techniques like magnetoencephalography (MEG) or electroencephalography (EEG) offer promise of non-invasive techniques.
The problem of source localization, or source imaging, poses however a high-dimensional statistical inference challenge.
We propose an ensemble of desparsified multi-task Lasso (ecd-MTLasso) to deal with this problem.
arXiv Detail & Related papers (2020-09-29T21:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.