Related papers: A multi-locus predictiveness curve and its summary assessment for genetic risk prediction

A multi-locus predictiveness curve and its summary assessment for genetic risk prediction

URL: http://arxiv.org/abs/2504.00024v1
Date: Fri, 28 Mar 2025 15:49:39 GMT
Title: A multi-locus predictiveness curve and its summary assessment for genetic risk prediction
Authors: Changshuai Wei, Ming Li, Yalu Wen, Chengyin Ye, Qing Lu,
Abstract summary: We propose a multi-marker predictiveness curve and a non-parametric method to construct the curve for case-control studies.<n>We also demonstrate the connections of predictiveness curve with ROC curve and Lorenz curve.<n>We conducted a real data analysis, using predictiveness curve and predictiveness U to evaluate a risk prediction model for Nicotine Dependence.
Score: 5.050463389414008
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the advance of high-throughput genotyping and sequencing technologies, it becomes feasible to comprehensive evaluate the role of massive genetic predictors in disease prediction. There exists, therefore, a critical need for developing appropriate statistical measurements to access the combined effects of these genetic variants in disease prediction. Predictiveness curve is commonly used as a graphical tool to measure the predictive ability of a risk prediction model on a single continuous biomarker. Yet, for most complex diseases, risk prediciton models are formed on multiple genetic variants. We therefore propose a multi-marker predictiveness curve and provide a non-parametric method to construct the curve for case-control studies. We further introduce a global predictiveness U and a partial predictiveness U to summarize prediction curve across the whole population and sub-population of clinical interest, respectively. We also demonstrate the connections of predictiveness curve with ROC curve and Lorenz curve. Through simulation, we compared the performance of the predictiveness U to other three summary indices: R square, Total Gain, and Average Entropy, and showed that Predictiveness U outperformed the other three indexes in terms of unbiasedness and robustness. Moreover, we simulated a series of rare-variants disease model, found partial predictiveness U performed better than global predictiveness U. Finally, we conducted a real data analysis, using predictiveness curve and predictiveness U to evaluate a risk prediction model for Nicotine Dependence.

Related papers

Conformalized Regression for Continuous Bounded Outcomes [0.0]
Regression problems with bounded continuous outcomes frequently arise in real-world statistical and machine learning applications.<n>Most of the existing statistical and machine learning literature has focused either on point prediction of bounded outcomes or on interval prediction based on approximations.<n>We develop conformal prediction intervals for bounded outcomes based on transformation models and beta regression.
arXiv Detail & Related papers (2025-07-18T15:51:48Z)
Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z)
Modeling Epidemic Spread: A Gaussian Process Regression Approach [0.7374726900469741]
We present a new data-driven method based on Gaussian process regression (GPR) to model epidemic spread. We present examples that use GPR to model and predict epidemic spread by using real-world infection data gathered in the UK during the COVID-19 epidemic.
arXiv Detail & Related papers (2023-12-14T22:45:01Z)
Quantifying predictive uncertainty of aphasia severity in stroke patients with sparse heteroscedastic Bayesian high-dimensional regression [47.1405366895538]
Sparse linear regression methods for high-dimensional data commonly assume that residuals have constant variance, which can be violated in practice. This paper proposes estimating high-dimensional heteroscedastic linear regression models using a heteroscedastic partitioned empirical Bayes Expectation Conditional Maximization algorithm.
arXiv Detail & Related papers (2023-09-15T22:06:29Z)
Stability of clinical prediction models developed using statistical or machine learning methods [0.5482532589225552]
Clinical prediction models estimate an individual's risk of a particular health outcome, conditional on their values of multiple predictors. Many models are developed using small datasets that lead to instability in the model and its predictions (estimated risks) We show instability in a model's estimated risks is often considerable, and manifests itself as miscalibration of predictions in new data.
arXiv Detail & Related papers (2022-11-02T11:55:28Z)
Predictive Multiplicity in Probabilistic Classification [25.111463701666864]
We present a framework for measuring predictive multiplicity in probabilistic classification. We demonstrate the incidence and prevalence of predictive multiplicity in real-world tasks. Our results emphasize the need to report predictive multiplicity more widely.
arXiv Detail & Related papers (2022-06-02T16:25:29Z)
Mitigating multiple descents: A model-agnostic framework for risk monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation. We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z)
A New Approach for Interpretability and Reliability in Clinical Risk Prediction: Acute Coronary Syndrome Scenario [0.33927193323747895]
We intend to create a new risk assessment methodology that combines the best characteristics of both risk score and machine learning models. The proposed approach achieved testing results identical to the standard LR, but offers superior interpretability and personalization. The reliability estimation of individual predictions presented a great correlation with the misclassifications rate.
arXiv Detail & Related papers (2021-10-15T19:33:46Z)
When in Doubt: Neural Non-Parametric Uncertainty Quantification for Epidemic Forecasting [70.54920804222031]
Most existing forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions. Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations. We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP.
arXiv Detail & Related papers (2021-06-07T18:31:47Z)
Learning to Predict with Supporting Evidence: Applications to Clinical Risk Prediction [9.199022926064009]
The impact of machine learning models on healthcare will depend on the degree of trust that healthcare professionals place in the predictions made by these models. We present a method to provide people with clinical expertise with domain-relevant evidence about why a prediction should be trusted.
arXiv Detail & Related papers (2021-03-04T00:26:32Z)
STELAR: Spatio-temporal Tensor Factorization with Latent Epidemiological Regularization [76.57716281104938]
We develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously. STELAR enables long-term prediction by incorporating latent temporal regularization through a system of discrete-time difference equations. We conduct experiments using both county- and state-level COVID-19 data and show that our model can identify interesting latent patterns of the epidemic.
arXiv Detail & Related papers (2020-12-08T21:21:47Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
Prediction in latent factor regression: Adaptive PCR and beyond [2.9439848714137447]
We prove a master theorem that establishes a risk bound for a large class of predictors. We use our main theorem to recover known risk bounds for the minimum-norm interpolating predictor. We conclude with a detailed simulation study to support and complement our theoretical results.
arXiv Detail & Related papers (2020-07-20T12:42:47Z)
Survival Cluster Analysis [93.50540270973927]
There is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles. An approach that addresses this need is likely to improve characterization of individual outcomes.
arXiv Detail & Related papers (2020-02-29T22:41:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.