Unsupervised EHR-based Phenotyping via Matrix and Tensor Decompositions
- URL: http://arxiv.org/abs/2209.00322v1
- Date: Thu, 1 Sep 2022 09:47:27 GMT
- Title: Unsupervised EHR-based Phenotyping via Matrix and Tensor Decompositions
- Authors: Florian Becker, Age K. Smilde, Evrim Acar
- Abstract summary: We provide a comprehensive review of low-rank approximation-based approaches for computational phenotyping.
Recent developments have adapted low-rank data approximation methods by incorporating different constraints and regularizations that facilitate interpretability further.
- Score: 0.6875312133832078
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computational phenotyping allows for unsupervised discovery of subgroups of
patients as well as corresponding co-occurring medical conditions from
electronic health records (EHR). Typically, EHR data contains demographic
information, diagnoses and laboratory results. Discovering (novel) phenotypes
has the potential to be of prognostic and therapeutic value. Providing medical
practitioners with transparent and interpretable results is an important
requirement and an essential part for advancing precision medicine. Low-rank
data approximation methods such as matrix (e.g., non-negative matrix
factorization) and tensor decompositions (e.g., CANDECOMP/PARAFAC) have
demonstrated that they can provide such transparent and interpretable insights.
Recent developments have adapted low-rank data approximation methods by
incorporating different constraints and regularizations that facilitate
interpretability further. In addition, they offer solutions for common
challenges within EHR data such as high dimensionality, data sparsity and
incompleteness. Especially extracting temporal phenotypes from longitudinal EHR
has received much attention in recent years. In this paper, we provide a
comprehensive review of low-rank approximation-based approaches for
computational phenotyping. The existing literature is categorized into temporal
vs. static phenotyping approaches based on matrix vs. tensor decompositions.
Furthermore, we outline different approaches for the validation of phenotypes,
i.e., the assessment of clinical significance.
Related papers
- Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Clustering of Disease Trajectories with Explainable Machine Learning: A Case Study on Postoperative Delirium Phenotypes [13.135589459700865]
We propose an approach that combines supervised machine learning for personalized POD risk prediction with unsupervised clustering techniques to uncover potential POD phenotypes.
We show that clustering patients in the SHAP feature importance space successfully recovers the true underlying phenotypes, outperforming clustering in the raw feature space.
arXiv Detail & Related papers (2024-05-06T10:05:46Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - PheME: A deep ensemble framework for improving phenotype prediction from
multi-modal data [42.56953523499849]
We present PheME, an Ensemble framework using Multi-modality data of structured EHRs and unstructured clinical notes for accurate Phenotype prediction.
We leverage ensemble learning to combine outputs from single-modal models and multi-modal models to improve phenotype predictions.
arXiv Detail & Related papers (2023-03-19T23:41:04Z) - Bayesian Networks for the robust and unbiased prediction of depression
and its symptoms utilizing speech and multimodal data [65.28160163774274]
We apply a Bayesian framework to capture the relationships between depression, depression symptoms, and features derived from speech, facial expression and cognitive game data collected at thymia.
arXiv Detail & Related papers (2022-11-09T14:48:13Z) - A cost-based multi-layer network approach for the discovery of patient
phenotypes [2.816539638885011]
We propose a cost-based layer selector model for detecting phenotypes using a community detection approach.
Our goal is to minimize the number of features used to build these phenotypes while preserving its quality.
For some post-treatment variables, predictors using phenotypes from COBALT as features outperformed those using phenotypes detected by traditional clustering methods.
arXiv Detail & Related papers (2022-09-19T14:07:10Z) - Enabling scalable clinical interpretation of ML-based phenotypes using
real world data [0.0]
This study investigates approaches to perform patient stratification analysis at scale using large EHR datasets.
We have developed several tools to facilitate the clinical evaluation and interpretation of unsupervised patient stratification results.
arXiv Detail & Related papers (2022-08-02T17:31:03Z) - Label scarcity in biomedicine: Data-rich latent factor discovery
enhances phenotype prediction [102.23901690661916]
Low-dimensional embedding spaces can be derived from the UK Biobank population dataset to enhance data-scarce prediction of health indicators, lifestyle and demographic characteristics.
Performances gains from semisupervison approaches will probably become an important ingredient for various medical data science applications.
arXiv Detail & Related papers (2021-10-12T16:25:50Z) - Learning Inter-Modal Correspondence and Phenotypes from Multi-Modal
Electronic Health Records [15.658012300789148]
We propose cHITF to infer the correspondence between multiple modalities jointly with the phenotype discovery.
Experiments conducted on the real-world MIMIC-III dataset demonstrate that cHITF effectively infers clinically meaningful inter-modal correspondence.
arXiv Detail & Related papers (2020-11-12T10:30:29Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.