Predicting blood pressure under circumstances of missing data: An
analysis of missing data patterns and imputation methods using NHANES
- URL: http://arxiv.org/abs/2305.01655v1
- Date: Mon, 1 May 2023 18:15:44 GMT
- Title: Predicting blood pressure under circumstances of missing data: An
analysis of missing data patterns and imputation methods using NHANES
- Authors: Harish Chauhan, Nikunj Gupta, Zoe Haskell-Craig
- Abstract summary: CVD is affected by raised blood pressure, raised blood glucose, raised blood lipids, and obesity.
Genetics and social/environmental factors such as poverty, stress, and racism also play an important role.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The World Health Organization defines cardio-vascular disease (CVD) as "a
group of disorders of the heart and blood vessels," including coronary heart
disease and stroke (WHO 21). CVD is affected by "intermediate risk factors"
such as raised blood pressure, raised blood glucose, raised blood lipids, and
obesity. These are predominantly influenced by lifestyle and behaviour,
including physical inactivity, unhealthy diets, high intake of salt, and
tobacco and alcohol use. However, genetics and social/environmental factors
such as poverty, stress, and racism also play an important role. Researchers
studying the behavioural and environmental factors associated with these
"intermediate risk factors" need access to high quality and detailed
information on diet and physical activity. However, missing data are a
pervasive problem in clinical and public health research, affecting both
randomized trials and observational studies. Reasons for missing data can vary
substantially across studies because of loss to follow-up, missed study visits,
refusal to answer survey questions, or an unrecorded measurement during an
office visit. One method of handling missing values is to simply delete
observations for which there is missingness (called Complete Case Analysis).
This is rarely used as deleting the data point containing missing data (List
wise deletion) results in a smaller number of samples and thus affects
accuracy. Additional methods of handling missing data exists, such as
summarizing the variables with its observed values (Available Case Analysis).
Motivated by the pervasiveness of missing data in the NHANES dataset, we will
conduct an analysis of imputation methods under different simulated patterns of
missing data. We will then apply these imputation methods to create a complete
dataset upon which we can use ordinary least squares to predict blood pressure
from diet and physical activity.
Related papers
- Fuzzy Rule based Intelligent Cardiovascular Disease Prediction using Complex Event Processing [0.8668211481067458]
Cardiovascular disease (CVDs) is a rapidly rising global concern due to unhealthy diets, lack of physical activity, and other factors.
Recent research has focused on accurate and timely disease prediction to reduce risk and fatalities.
We propose a fuzzy rule-based system for monitoring clinical data to provide real-time decision support.
arXiv Detail & Related papers (2024-09-19T16:36:24Z) - Stressor Type Matters! -- Exploring Factors Influencing Cross-Dataset Generalizability of Physiological Stress Detection [5.304745246313982]
This study explores the generalizability of machine learning models trained on HRV features for binary stress detection.
Our findings reveal a crucial factor affecting model generalizability: stressor type.
We recommend matching the stressor type when deploying HRV-based stress models in new environments.
arXiv Detail & Related papers (2024-05-06T14:47:48Z) - Interpretable Causal Inference for Analyzing Wearable, Sensor, and Distributional Data [62.56890808004615]
We develop an interpretable method for distributional data analysis that ensures trustworthy and robust decision-making.
We demonstrate ADD MALTS' utility by studying the effectiveness of continuous glucose monitors in mitigating diabetes risks.
arXiv Detail & Related papers (2023-12-17T00:42:42Z) - Towards Assessing Data Bias in Clinical Trials [0.0]
Health care datasets can still be affected by data bias.
Data bias provides a distorted view of reality, leading to wrong analysis results and, consequently, decisions.
This paper proposes a method to address bias in datasets that: (i) defines the types of data bias that may be present in the dataset, (ii) characterizes and quantifies data bias with adequate metrics, and (iii) provides guidelines to identify, measure, and mitigate data bias for different data sources.
arXiv Detail & Related papers (2022-12-19T17:10:06Z) - Comparison of Missing Data Imputation Methods using the Framingham Heart
study dataset [0.0]
We test and modify state-of-the-art missing value imputation methods based on Generative Adversarial Networks (GANs) and Autoencoders.
The evaluation is accomplished for both the tasks of data imputation and post-imputation prediction.
arXiv Detail & Related papers (2022-10-06T18:35:08Z) - Why Interpretable Causal Inference is Important for High-Stakes Decision
Making for Critically Ill Patients and How To Do It [80.24494623756839]
We present a framework for interpretable estimation of causal effects for critically ill patients.
We apply this framework to the effect of seizures and other potentially harmful electrical events in the brain on outcomes.
arXiv Detail & Related papers (2022-03-09T18:03:35Z) - To Impute or not to Impute? -- Missing Data in Treatment Effect
Estimation [84.76186111434818]
We identify a new missingness mechanism, which we term mixed confounded missingness (MCM), where some missingness determines treatment selection and other missingness is determined by treatment selection.
We show that naively imputing all data leads to poor performing treatment effects models, as the act of imputation effectively removes information necessary to provide unbiased estimates.
Our solution is selective imputation, where we use insights from MCM to inform precisely which variables should be imputed and which should not.
arXiv Detail & Related papers (2022-02-04T12:08:31Z) - An introduction to causal reasoning in health analytics [2.199093822766999]
We will try to highlight some of the drawbacks that may arise in traditional machine learning and statistical approaches to analyze the observational data.
We will demonstrate the applications of causal inference in tackling some common machine learning issues.
arXiv Detail & Related papers (2021-05-10T20:25:56Z) - Personalized pathology test for Cardio-vascular disease: Approximate
Bayesian computation with discriminative summary statistics learning [48.7576911714538]
We propose a platelet deposition model and an inferential scheme to estimate the biologically meaningful parameters using approximate computation.
This work opens up an unprecedented opportunity of personalized pathology test for CVD detection and medical treatment.
arXiv Detail & Related papers (2020-10-13T15:20:21Z) - Robustness to Spurious Correlations via Human Annotations [100.63051542531171]
We present a framework for making models robust to spurious correlations by leveraging humans' common sense knowledge of causality.
Specifically, we use human annotation to augment each training example with a potential unmeasured variable.
We then introduce a new distributionally robust optimization objective over unmeasured variables (UV-DRO) to control the worst-case loss over possible test-time shifts.
arXiv Detail & Related papers (2020-07-13T20:05:19Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.