Tree-Guided Rare Feature Selection and Logic Aggregation with Electronic
Health Records Data
- URL: http://arxiv.org/abs/2206.09107v2
- Date: Mon, 26 Feb 2024 20:29:50 GMT
- Title: Tree-Guided Rare Feature Selection and Logic Aggregation with Electronic
Health Records Data
- Authors: Jianmin Chen, Robert H. Aseltine, Fei Wang, Kun Chen
- Abstract summary: We propose a tree-guided feature selection and logic aggregation approach for large-scale regression with rare binary features.
In a suicide risk study with EHR data, our approach is able to select and aggregate prior mental health diagnoses.
- Score: 7.422597776308963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Statistical learning with a large number of rare binary features is commonly
encountered in analyzing electronic health records (EHR) data, especially in
the modeling of disease onset with prior medical diagnoses and procedures.
Dealing with the resulting highly sparse and large-scale binary feature matrix
is notoriously challenging as conventional methods may suffer from a lack of
power in testing and inconsistency in model fitting while machine learning
methods may suffer from the inability of producing interpretable results or
clinically-meaningful risk factors. To improve EHR-based modeling and utilize
the natural hierarchical structure of disease classification, we propose a
tree-guided feature selection and logic aggregation approach for large-scale
regression with rare binary features, in which dimension reduction is achieved
through not only a sparsity pursuit but also an aggregation promoter with the
logic operator of ``or''. We convert the combinatorial problem into a convex
linearly-constrained regularized estimation, which enables scalable computation
with theoretical guarantees. In a suicide risk study with EHR data, our
approach is able to select and aggregate prior mental health diagnoses as
guided by the diagnosis hierarchy of the International Classification of
Diseases. By balancing the rarity and specificity of the EHR diagnosis records,
our strategy improves both prediction and model interpretation. We identify
important higher-level categories and subcategories of mental health conditions
and simultaneously determine the level of specificity needed for each of them
in predicting suicide risk.
Related papers
- Inference of Dependency Knowledge Graph for Electronic Health Records [13.35941801610195]
We propose a framework for deriving a sparse knowledge graph based on the dynamic log-linear topic model.
Within this model, the KG embeddings are estimated by performing singular value decomposition on the empirical pointwise mutual information matrix.
We then establish entrywise normality for the KG low-rank estimator, enabling the recovery of sparse graph edges with controlled type I error.
arXiv Detail & Related papers (2023-12-25T04:45:36Z) - An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in
Healthcare Datasets [32.25265709333831]
We generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity)
We then apply a systematic analysis of AEq values across subpopulations to identify and manifestations of racial bias in two known cases in healthcare.
AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.
arXiv Detail & Related papers (2023-11-06T17:08:41Z) - AI Framework for Early Diagnosis of Coronary Artery Disease: An
Integration of Borderline SMOTE, Autoencoders and Convolutional Neural
Networks Approach [0.44998333629984877]
We develop a methodology for balancing and augmenting data for more accurate prediction when the data is imbalanced and the sample size is small.
The experimental results revealed that the average accuracy of our proposed method for CAD prediction was 95.36, and was higher than random forest (RF), decision tree (DT), support vector machine (SVM), logistic regression (LR), and artificial neural network (ANN)
arXiv Detail & Related papers (2023-08-29T14:33:38Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Deep Stable Representation Learning on Electronic Health Records [8.256340233221112]
Causal Healthcare Embedding (CHE) aims at eliminating the spurious statistical relationship by removing the dependencies between diagnoses and procedures.
Our proposed CHE method can be used as a flexible plug-and-play module that can enhance existing deep learning models on EHR.
arXiv Detail & Related papers (2022-09-03T04:10:45Z) - Analysis of lifelog data using optimal feature selection based
unsupervised logistic regression (OFS-ULR) for chronic disease classification [2.3909933791900326]
Chronic disease classification models are now harnessing the potential of lifelog data to explore better healthcare practices.
This paper is to construct an optimal feature selection-based unsupervised logistic regression model (OFS-ULR) to classify chronic diseases.
arXiv Detail & Related papers (2022-04-04T07:11:26Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.