What is Hiding in Medicine's Dark Matter? Learning with Missing Data in
Medical Practices
- URL: http://arxiv.org/abs/2402.06563v1
- Date: Fri, 9 Feb 2024 17:27:35 GMT
- Title: What is Hiding in Medicine's Dark Matter? Learning with Missing Data in
Medical Practices
- Authors: Neslihan Suzen, Evgeny M. Mirkes, Damian Roland, Jeremy Levesley,
Alexander N. Gorban, Tim J. Coats
- Abstract summary: Missing data may be linked to health care professional practice patterns.
We have examined 79 TARN fields with missing values for 5,791 trauma cases.
We have concluded that the 1NN imputer is the best imputation which indicates a usual pattern of clinical decision making.
- Score: 38.64139739520114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Electronic patient records (EPRs) produce a wealth of data but contain
significant missing information. Understanding and handling this missing data
is an important part of clinical data analysis and if left unaddressed could
result in bias in analysis and distortion in critical conclusions. Missing data
may be linked to health care professional practice patterns and imputation of
missing data can increase the validity of clinical decisions. This study
focuses on statistical approaches for understanding and interpreting the
missing data and machine learning based clinical data imputation using a single
centre's paediatric emergency data and the data from UK's largest clinical
audit for traumatic injury database (TARN). In the study of 56,961 data points
related to initial vital signs and observations taken on children presenting to
an Emergency Department, we have shown that missing data are likely to be
non-random and how these are linked to health care professional practice
patterns. We have then examined 79 TARN fields with missing values for 5,791
trauma cases. Singular Value Decomposition (SVD) and k-Nearest Neighbour (kNN)
based missing data imputation methods are used and imputation results against
the original dataset are compared and statistically tested. We have concluded
that the 1NN imputer is the best imputation which indicates a usual pattern of
clinical decision making: find the most similar patients and take their
attributes as imputation.
Related papers
- SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing [67.8991481023825]
Sepsis is the leading cause of in-hospital mortality in the USA.
Existing predictive models are usually trained on high-quality data with few missing information.
For the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm.
arXiv Detail & Related papers (2024-07-24T04:47:36Z) - Causal thinking for decision making on Electronic Health Records: why
and how [0.0]
Causal thinking is needed for data-driven decisions.
We present a step-by-step framework to help build valid decision making from real-life patient records.
arXiv Detail & Related papers (2023-08-03T08:17:00Z) - Temporal-spatial Correlation Attention Network for Clinical Data
Analysis in Intensive Care Unit [27.885961694582896]
We propose a temporal-saptial correlation attention network (TSCAN) to handle some clinical characteristic prediction problems.
Based on the design of the attention mechanism model, our approach can effectively remove irrelevant items in clinical data and irrelevant nodes in time.
Our method can also find key clinical indicators of important outcomes that can be used to improve treatment options.
arXiv Detail & Related papers (2023-06-03T00:38:40Z) - Time-dependent Iterative Imputation for Multivariate Longitudinal
Clinical Data [0.0]
Time-Dependent Iterative imputation offers a practical solution for imputing time-series data.
When applied to a cohort consisting of more than 500,000 patient observations, our approach outperformed state-of-the-art imputation methods.
arXiv Detail & Related papers (2023-04-16T16:10:49Z) - Towards Assessing Data Bias in Clinical Trials [0.0]
Health care datasets can still be affected by data bias.
Data bias provides a distorted view of reality, leading to wrong analysis results and, consequently, decisions.
This paper proposes a method to address bias in datasets that: (i) defines the types of data bias that may be present in the dataset, (ii) characterizes and quantifies data bias with adequate metrics, and (iii) provides guidelines to identify, measure, and mitigate data bias for different data sources.
arXiv Detail & Related papers (2022-12-19T17:10:06Z) - Comparison of Missing Data Imputation Methods using the Framingham Heart
study dataset [0.0]
We test and modify state-of-the-art missing value imputation methods based on Generative Adversarial Networks (GANs) and Autoencoders.
The evaluation is accomplished for both the tasks of data imputation and post-imputation prediction.
arXiv Detail & Related papers (2022-10-06T18:35:08Z) - The pitfalls of using open data to develop deep learning solutions for
COVID-19 detection in chest X-rays [64.02097860085202]
Deep learning models have been developed to identify COVID-19 from chest X-rays.
Results have been exceptional when training and testing on open-source data.
Data analysis and model evaluations show that the popular open-source dataset COVIDx is not representative of the real clinical problem.
arXiv Detail & Related papers (2021-09-14T10:59:11Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.