Statistical quantification of confounding bias in predictive modelling
- URL: http://arxiv.org/abs/2111.00814v1
- Date: Mon, 1 Nov 2021 10:35:24 GMT
- Title: Statistical quantification of confounding bias in predictive modelling
- Authors: Tamas Spisak
- Abstract summary: I propose the partial and full confounder tests, which probe the null hypotheses of unconfounded and fully confounded models.
The tests provide a strict control for Type I errors and high statistical power, even for non-normally and non-linearly dependent predictions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The lack of non-parametric statistical tests for confounding bias
significantly hampers the development of robust, valid and generalizable
predictive models in many fields of research. Here I propose the partial and
full confounder tests, which, for a given confounder variable, probe the null
hypotheses of unconfounded and fully confounded models, respectively. The tests
provide a strict control for Type I errors and high statistical power, even for
non-normally and non-linearly dependent predictions, often seen in machine
learning. Applying the proposed tests on models trained on functional brain
connectivity data from the Human Connectome Project and the Autism Brain
Imaging Data Exchange dataset reveals confounders that were previously
unreported or found to be hard to correct for with state-of-the-art confound
mitigation approaches. The tests, implemented in the package mlconfound
(https://mlconfound.readthedocs.io), can aid the assessment and improvement of
the generalizability and neurobiological validity of predictive models and,
thereby, foster the development of clinically useful machine learning
biomarkers.
Related papers
- Deep Learning Framework with Uncertainty Quantification for Survey Data: Assessing and Predicting Diabetes Mellitus Risk in the American Population [2.3849116823891987]
This paper proposes a general predictive framework for regression and classification using neural network (NN) modeling.
We apply this framework to assess the risk of Diabetes Mellitus in the US population, utilizing data from the NHANES 2011-2014 cohort.
While focused on diabetes, this NN predictive framework is adaptable to create clinical models for a diverse range of diseases and medical cohorts.
arXiv Detail & Related papers (2024-03-28T18:06:11Z) - Unified Uncertainty Estimation for Cognitive Diagnosis Models [70.46998436898205]
We propose a unified uncertainty estimation approach for a wide range of cognitive diagnosis models.
We decompose the uncertainty of diagnostic parameters into data aspect and model aspect.
Our method is effective and can provide useful insights into the uncertainty of cognitive diagnosis.
arXiv Detail & Related papers (2024-03-09T13:48:20Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Targeted-BEHRT: Deep learning for observational causal inference on
longitudinal electronic health records [1.3192560874022086]
We investigate causal modelling of an RCT-established null causal association: the effect of antihypertensive use on incident cancer risk.
We develop a dataset for our observational study and a Transformer-based model, Targeted BEHRT coupled with doubly robust estimation.
We find that our model provides more accurate estimates of RR compared to benchmarks for risk ratio estimation on high-dimensional EHR.
arXiv Detail & Related papers (2022-02-07T20:05:05Z) - Generalizability of Machine Learning Models: Quantitative Evaluation of
Three Methodological Pitfalls [1.3870303451896246]
We implement random forest and deep convolutional neural network models using several medical imaging datasets.
We show that violation of the independence assumption could substantially affect model generalizability.
Inappropriate performance indicators could lead to erroneous conclusions.
arXiv Detail & Related papers (2022-02-01T05:07:27Z) - Deep Learning in current Neuroimaging: a multivariate approach with
power and type I error control but arguable generalization ability [0.158310730488265]
A non-parametric framework is proposed that estimates the statistical significance of classifications using deep learning architectures.
A label permutation test is proposed in both studies using cross-validation (CV) and resubstitution with upper bound correction (RUB) as validation methods.
We found in the permutation test that CV and RUB methods offer a false positive rate close to the significance level and an acceptable statistical power.
arXiv Detail & Related papers (2021-03-30T21:15:39Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Impact of Medical Data Imprecision on Learning Results [9.379890125442333]
We study the impact of imprecision on prediction results in a healthcare application.
A pre-trained model is used to predict future state of hyperthyroidism for patients.
arXiv Detail & Related papers (2020-07-24T06:54:57Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.