Related papers: Inference of Dependency Knowledge Graph for Electronic Health Records

Inference of Dependency Knowledge Graph for Electronic Health Records

URL: http://arxiv.org/abs/2312.15611v1
Date: Mon, 25 Dec 2023 04:45:36 GMT
Title: Inference of Dependency Knowledge Graph for Electronic Health Records
Authors: Zhiwei Xu, Ziming Gan, Doudou Zhou, Shuting Shen, Junwei Lu, Tianxi Cai
Abstract summary: We propose a framework for deriving a sparse knowledge graph based on the dynamic log-linear topic model. Within this model, the KG embeddings are estimated by performing singular value decomposition on the empirical pointwise mutual information matrix. We then establish entrywise normality for the KG low-rank estimator, enabling the recovery of sparse graph edges with controlled type I error.
Score: 13.35941801610195
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The effective analysis of high-dimensional Electronic Health Record (EHR) data, with substantial potential for healthcare research, presents notable methodological challenges. Employing predictive modeling guided by a knowledge graph (KG), which enables efficient feature selection, can enhance both statistical efficiency and interpretability. While various methods have emerged for constructing KGs, existing techniques often lack statistical certainty concerning the presence of links between entities, especially in scenarios where the utilization of patient-level EHR data is limited due to privacy concerns. In this paper, we propose the first inferential framework for deriving a sparse KG with statistical guarantee based on the dynamic log-linear topic model proposed by \cite{arora2016latent}. Within this model, the KG embeddings are estimated by performing singular value decomposition on the empirical pointwise mutual information matrix, offering a scalable solution. We then establish entrywise asymptotic normality for the KG low-rank estimator, enabling the recovery of sparse graph edges with controlled type I error. Our work uniquely addresses the under-explored domain of statistical inference about non-linear statistics under the low-rank temporal dependent models, a critical gap in existing research. We validate our approach through extensive simulation studies and then apply the method to real-world EHR data in constructing clinical KGs and generating clinical feature embeddings.

Related papers

A Foundational Brain Dynamics Model via Stochastic Optimal Control [15.8358479596609]
We introduce a foundational model for brain dynamics that utilizes optimal control (SOC) and amortized inference. Our method features a continuous-discrete state space model (SSM) that can robustly handle the intricate and noisy nature of fMRI signals. Our model attains state-of-the-art results across a variety of downstream tasks, including demographic prediction, trait analysis, disease diagnosis, and prognosis.
arXiv Detail & Related papers (2025-02-07T12:57:26Z)
On the use of Statistical Learning Theory for model selection in Structural Health Monitoring [0.0]
This paper focusses on how well models can generalise in Structural Health Monitoring (SHM) It is demonstrated that incorporating domain knowledge into the regression problem yields a lower guaranteed risk, thereby enhancing generalisation.
arXiv Detail & Related papers (2025-01-14T11:56:05Z)
Addressing Data Heterogeneity in Federated Learning of Cox Proportional Hazards Models [8.798959872821962]
This paper outlines an approach in the domain of federated survival analysis, specifically the Cox Proportional Hazards (CoxPH) model. We present an FL approach that employs feature-based clustering to enhance model accuracy across synthetic datasets and real-world applications.
arXiv Detail & Related papers (2024-07-20T18:34:20Z)
Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD. It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation. We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z)
TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subtyping based on EHR Data [42.96821770394798]
TACCO is a novel framework that jointly discovers clusters of clinical concepts and patient visits based on a hypergraph modeling of EHR data. We conduct experiments on the public MIMIC-III dataset and Emory internal CRADLE dataset over the downstream clinical tasks of phenotype classification and cardiovascular risk prediction. In-depth model analysis, clustering results analysis, and clinical case studies further validate the improved utilities and insightful interpretations delivered by TACCO.
arXiv Detail & Related papers (2024-06-14T14:18:38Z)
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z)
Knowledge Graph Embedding with Electronic Health Records Data via Latent Graphical Block Model [13.398292423857756]
We propose to infer the conditional dependency structure among EHR features via a latent graphical block model (LGBM) We establish the statistical rates of the proposed estimators and show the perfect recovery of the block structure.
arXiv Detail & Related papers (2023-05-31T16:18:46Z)
Causal Inference via Nonlinear Variable Decorrelation for Healthcare Applications [60.26261850082012]
We introduce a novel method with a variable decorrelation regularizer to handle both linear and nonlinear confounding. We employ association rules as new representations using association rule mining based on the original features to increase model interpretability.
arXiv Detail & Related papers (2022-09-29T17:44:14Z)
Tree-Guided Rare Feature Selection and Logic Aggregation with Electronic Health Records Data [7.422597776308963]
We propose a tree-guided feature selection and logic aggregation approach for large-scale regression with rare binary features. In a suicide risk study with EHR data, our approach is able to select and aggregate prior mental health diagnoses.
arXiv Detail & Related papers (2022-06-18T03:52:43Z)
Mixed Effects Neural ODE: A Variational Approximation for Analyzing the Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data. We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem. We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
Statistical control for spatio-temporal MEG/EEG source imaging with desparsified multi-task Lasso [102.84915019938413]
Non-invasive techniques like magnetoencephalography (MEG) or electroencephalography (EEG) offer promise of non-invasive techniques. The problem of source localization, or source imaging, poses however a high-dimensional statistical inference challenge. We propose an ensemble of desparsified multi-task Lasso (ecd-MTLasso) to deal with this problem.
arXiv Detail & Related papers (2020-09-29T21:17:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.