Surrogate Assisted Semi-supervised Inference for High Dimensional Risk
Prediction
- URL: http://arxiv.org/abs/2105.01264v1
- Date: Tue, 4 May 2021 03:08:51 GMT
- Title: Surrogate Assisted Semi-supervised Inference for High Dimensional Risk
Prediction
- Authors: Jue Hou, Zijian Guo and Tianxi Cai
- Abstract summary: We develop a surrogate assisted semi-supervised-learning (SAS) approach to risk modeling with high dimensional predictors.
We demonstrate that the SAS procedure provides valid inference for the predicted risk derived from a high dimensional working model.
- Score: 3.10560974227074
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Risk modeling with EHR data is challenging due to a lack of direct
observations on the disease outcome, and the high dimensionality of the
candidate predictors. In this paper, we develop a surrogate assisted
semi-supervised-learning (SAS) approach to risk modeling with high dimensional
predictors, leveraging a large unlabeled data on candidate predictors and
surrogates of outcome, as well as a small labeled data with annotated outcomes.
The SAS procedure borrows information from surrogates along with candidate
predictors to impute the unobserved outcomes via a sparse working imputation
model with moment conditions to achieve robustness against mis-specification in
the imputation model and a one-step bias correction to enable interval
estimation for the predicted risk. We demonstrate that the SAS procedure
provides valid inference for the predicted risk derived from a high dimensional
working model, even when the underlying risk prediction model is dense and the
risk model is mis-specified. We present an extensive simulation study to
demonstrate the superiority of our SSL approach compared to existing supervised
methods. We apply the method to derive genetic risk prediction of type-2
diabetes mellitus using a EHR biobank cohort.
Related papers
- Data-driven decision-making under uncertainty with entropic risk measure [5.407319151576265]
The entropic risk measure is widely used in high-stakes decision making to account for tail risks associated with an uncertain loss.
To debias the empirical entropic risk estimator, we propose a strongly consistent bootstrapping procedure.
We show that cross validation methods can result in significantly higher out-of-sample risk for the insurer if the bias in validation performance is not corrected for.
arXiv Detail & Related papers (2024-09-30T04:02:52Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Boosting the interpretability of clinical risk scores with intervention
predictions [59.22442473992704]
We propose a joint model of intervention policy and adverse event risk as a means to explicitly communicate the model's assumptions about future interventions.
We show how combining typical risk scores, such as the likelihood of mortality, with future intervention probability scores leads to more interpretable clinical predictions.
arXiv Detail & Related papers (2022-07-06T19:49:42Z) - Mitigating multiple descents: A model-agnostic framework for risk
monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation.
We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z) - SurvLatent ODE : A Neural ODE based time-to-event model with competing
risks for longitudinal data improves cancer-associated Deep Vein Thrombosis
(DVT) prediction [68.8204255655161]
We propose a generative time-to-event model, SurvLatent ODE, which parameterizes a latent representation under irregularly sampled data.
Our model then utilizes the latent representation to flexibly estimate survival times for multiple competing events without specifying shapes of event-specific hazard function.
SurvLatent ODE outperforms the current clinical standard Khorana Risk scores for stratifying DVT risk groups.
arXiv Detail & Related papers (2022-04-20T17:28:08Z) - A New Approach for Interpretability and Reliability in Clinical Risk
Prediction: Acute Coronary Syndrome Scenario [0.33927193323747895]
We intend to create a new risk assessment methodology that combines the best characteristics of both risk score and machine learning models.
The proposed approach achieved testing results identical to the standard LR, but offers superior interpretability and personalization.
The reliability estimation of individual predictions presented a great correlation with the misclassifications rate.
arXiv Detail & Related papers (2021-10-15T19:33:46Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Efficient Estimation and Evaluation of Prediction Rules in
Semi-Supervised Settings under Stratified Sampling [6.930951733450623]
We propose a two-step semi-supervised learning (SSL) procedure for evaluating a prediction rule derived from a working binary regression model.
In step I, we impute the missing labels via weighted regression with nonlinear basis functions to account for nonrandom sampling.
In step II, we augment the initial imputations to ensure the consistency of the resulting estimators.
arXiv Detail & Related papers (2020-10-19T12:54:45Z) - Learning to Predict Error for MRI Reconstruction [67.76632988696943]
We demonstrate that predictive uncertainty estimated by the current methods does not highly correlate with prediction error.
We propose a novel method that estimates the target labels and magnitude of the prediction error in two steps.
arXiv Detail & Related papers (2020-02-13T15:55:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.