Improving Covariance-Regularized Discriminant Analysis for EHR-based
Predictive Analytics of Diseases
- URL: http://arxiv.org/abs/1610.05446v4
- Date: Wed, 8 Mar 2023 08:52:31 GMT
- Title: Improving Covariance-Regularized Discriminant Analysis for EHR-based
Predictive Analytics of Diseases
- Authors: Sijia Yang, Haoyi Xiong, Kaibo Xu, Licheng Wang, Jiang Bian, Zeyi Sun
- Abstract summary: We study an analytical model that understands the accuracy of LDA for classifying data with arbitrary distribution.
We also propose a novel LDA classifier De-Sparse that outperforms state-of-the-art LDA approaches developed for HDLSS data.
- Score: 20.697847129363463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linear Discriminant Analysis (LDA) is a well-known technique for feature
extraction and dimension reduction. The performance of classical LDA, however,
significantly degrades on the High Dimension Low Sample Size (HDLSS) data for
the ill-posed inverse problem. Existing approaches for HDLSS data
classification typically assume the data in question are with Gaussian
distribution and deal the HDLSS classification problem with regularization.
However, these assumptions are too strict to hold in many emerging real-life
applications, such as enabling personalized predictive analysis using
Electronic Health Records (EHRs) data collected from an extremely limited
number of patients who have been diagnosed with or without the target disease
for prediction. In this paper, we revised the problem of predictive analysis of
disease using personal EHR data and LDA classifier. To fill the gap, in this
paper, we first studied an analytical model that understands the accuracy of
LDA for classifying data with arbitrary distribution. The model gives a
theoretical upper bound of LDA error rate that is controlled by two factors:
(1) the statistical convergence rate of (inverse) covariance matrix estimators
and (2) the divergence of the training/testing datasets to fitted
distributions. To this end, we could lower the error rate by balancing the two
factors for better classification performance. Hereby, we further proposed a
novel LDA classifier De-Sparse that leverages De-sparsified Graphical Lasso to
improve the estimation of LDA, which outperforms state-of-the-art LDA
approaches developed for HDLSS data. Such advances and effectiveness are
further demonstrated by both theoretical analysis and extensive experiments on
EHR datasets.
Related papers
- Stability and Generalization for Distributed SGDA [70.97400503482353]
We propose the stability-based generalization analytical framework for Distributed-SGDA.
We conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics.
Our theoretical results reveal the trade-off between the generalization gap and optimization error.
arXiv Detail & Related papers (2024-11-14T11:16:32Z) - Directly Handling Missing Data in Linear Discriminant Analysis for Enhancing Classification Accuracy and Interpretability [1.4840867281815378]
We introduce a novel and robust classification method, termed weighted missing Linear Discriminant Analysis (WLDA)
WLDA extends Linear Discriminant Analysis (LDA) to handle datasets with missing values without the need for imputation.
We conduct an in-depth theoretical analysis to establish the properties of WLDA and thoroughly evaluate its explainability.
arXiv Detail & Related papers (2024-06-30T14:21:32Z) - Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation [10.65123164779962]
Deep learning-based medical image segmentation models often face performance degradation when deployed across various medical centers.
We propose a novel Human-in-the-loop TTA framework that capitalizes on the largely overlooked potential of clinician-corrected predictions.
Our framework conceives a divergence loss, designed specifically to diminish the prediction divergence instigated by domain disparities.
arXiv Detail & Related papers (2024-05-14T02:02:15Z) - An ADRC-Incorporated Stochastic Gradient Descent Algorithm for Latent
Factor Analysis [6.843073158719234]
A gradient descent (SGD)-based latent factor analysis (LFA) model is remarkably effective in extracting valuable information from an HDI matrix.
A standard SGD algorithm only considers the current learning error to compute the gradient without considering the historical and future state of the learning error.
This paper innovatively proposes an ADRC-incorporated SGD (ADS) algorithm by refining the instance learning error by considering the historical and future state.
arXiv Detail & Related papers (2024-01-13T08:38:54Z) - Minimally Informed Linear Discriminant Analysis: training an LDA model
with unlabelled data [51.673443581397954]
We show that it is possible to compute the exact projection vector from LDA models based on unlabelled data.
We show that the MILDA projection vector can be computed in a closed form with a computational cost comparable to LDA.
arXiv Detail & Related papers (2023-10-17T09:50:31Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic
Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment.
The scarcity of annotated data limits the effectiveness and generalization of existing methods.
We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z) - BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery [97.79015388276483]
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG)
Recent advances enabled effective maximum-likelihood point estimation of DAGs from observational data.
We propose BCD Nets, a variational framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM.
arXiv Detail & Related papers (2021-12-06T03:35:21Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Efficient Estimation and Evaluation of Prediction Rules in
Semi-Supervised Settings under Stratified Sampling [6.930951733450623]
We propose a two-step semi-supervised learning (SSL) procedure for evaluating a prediction rule derived from a working binary regression model.
In step I, we impute the missing labels via weighted regression with nonlinear basis functions to account for nonrandom sampling.
In step II, we augment the initial imputations to ensure the consistency of the resulting estimators.
arXiv Detail & Related papers (2020-10-19T12:54:45Z) - A Doubly Regularized Linear Discriminant Analysis Classifier with
Automatic Parameter Selection [24.027886914804775]
Linear discriminant analysis (LDA) based classifiers tend to falter in many practical settings where the training data size is smaller than, or comparable to, the number of features.
We propose a doubly regularized LDA classifier that we denote as R2LDA.
Results obtained from both synthetic and real data demonstrate the consistency and effectiveness of the proposed R2LDA approach.
arXiv Detail & Related papers (2020-04-28T07:09:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.