Domain constraints improve risk prediction when outcome data is missing
- URL: http://arxiv.org/abs/2312.03878v3
- Date: Fri, 19 Apr 2024 18:48:13 GMT
- Title: Domain constraints improve risk prediction when outcome data is missing
- Authors: Sidhika Balachandar, Nikhil Garg, Emma Pierson,
- Abstract summary: We show that a machine learning model can accurately estimate risk for both tested and untested patients.
We apply our model to a case study of cancer risk prediction, showing that the model's inferred risk predicts cancer diagnoses.
- Score: 1.6840408099522377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models are often trained to predict the outcome resulting from a human decision. For example, if a doctor decides to test a patient for disease, will the patient test positive? A challenge is that historical decision-making determines whether the outcome is observed: we only observe test outcomes for patients doctors historically tested. Untested patients, for whom outcomes are unobserved, may differ from tested patients along observed and unobserved dimensions. We propose a Bayesian model class which captures this setting. The purpose of the model is to accurately estimate risk for both tested and untested patients. Estimating this model is challenging due to the wide range of possibilities for untested patients. To address this, we propose two domain constraints which are plausible in health settings: a prevalence constraint, where the overall disease prevalence is known, and an expertise constraint, where the human decision-maker deviates from purely risk-based decision-making only along a constrained feature set. We show theoretically and on synthetic data that domain constraints improve parameter inference. We apply our model to a case study of cancer risk prediction, showing that the model's inferred risk predicts cancer diagnoses, its inferred testing policy captures known public health policies, and it can identify suboptimalities in test allocation. Though our case study is in healthcare, our analysis reveals a general class of domain constraints which can improve model estimation in many settings.
Related papers
- Ethical considerations of use of hold-out sets in clinical prediction model management [0.4194295877935868]
We focus on the ethical principles of beneficence, non-maleficence, autonomy and justice.
We also discuss statistical issues arising from different hold-out set sampling methods.
arXiv Detail & Related papers (2024-06-05T11:42:46Z) - Towards a Transportable Causal Network Model Based on Observational
Healthcare Data [1.333879175460266]
We propose a novel approach that combines selection diagrams, missingness graphs, causal discovery and prior knowledge into a single graphical model.
We learn this model from data comprising two different cohorts of patients.
The resulting causal network model is validated by expert clinicians in terms of risk assessment, accuracy and explainability.
arXiv Detail & Related papers (2023-11-13T13:23:31Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Diagnosis Uncertain Models For Medical Risk Prediction [80.07192791931533]
We consider a patient risk model which has access to vital signs, lab values, and prior history but does not have access to a patient's diagnosis.
We show that such all-cause' risk models have good generalization across diagnoses but have a predictable failure mode.
We propose a fix for this problem by explicitly modeling the uncertainty in risk prediction coming from uncertainty in patient diagnoses.
arXiv Detail & Related papers (2023-06-29T23:36:04Z) - Counterfactual Prediction Under Outcome Measurement Error [29.071173441651734]
We study intersectional threats to model reliability introduced by outcome measurement error, treatment effects, and selection bias from historical decision-making policies.
We develop an unbiased risk minimization method which corrects for the combined effects of these challenges.
arXiv Detail & Related papers (2023-02-22T03:34:19Z) - What Do You See in this Patient? Behavioral Testing of Clinical NLP
Models [69.09570726777817]
We introduce an extendable testing framework that evaluates the behavior of clinical outcome models regarding changes of the input.
We show that model behavior varies drastically even when fine-tuned on the same data and that allegedly best-performing models have not always learned the most medically plausible patterns.
arXiv Detail & Related papers (2021-11-30T15:52:04Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z) - Uncertainty estimation for classification and risk prediction on medical
tabular data [0.0]
This work advances the understanding of uncertainty estimation for classification and risk prediction on medical data.
In a data-scarce field such as healthcare, the ability to measure the uncertainty of a model's prediction could potentially lead to improved effectiveness of decision support tools.
arXiv Detail & Related papers (2020-04-13T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.