Enabling scalable clinical interpretation of ML-based phenotypes using
real world data
- URL: http://arxiv.org/abs/2208.01607v1
- Date: Tue, 2 Aug 2022 17:31:03 GMT
- Title: Enabling scalable clinical interpretation of ML-based phenotypes using
real world data
- Authors: Owen Parsons (1), Nathan E Barlow (1), Janie Baxter (1), Karen
Paraschin (2), Andrea Derix (2), Peter Hein (2), Robert D\"urichen (1) ((1)
Sensyne Health, Oxford, UK, (2) Research and Development, Pharmaceuticals,
Bayer AG, Wuppertal, Germany)
- Abstract summary: This study investigates approaches to perform patient stratification analysis at scale using large EHR datasets.
We have developed several tools to facilitate the clinical evaluation and interpretation of unsupervised patient stratification results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The availability of large and deep electronic healthcare records (EHR)
datasets has the potential to enable a better understanding of real-world
patient journeys, and to identify novel subgroups of patients. ML-based
aggregation of EHR data is mostly tool-driven, i.e., building on available or
newly developed methods. However, these methods, their input requirements, and,
importantly, resulting output are frequently difficult to interpret, especially
without in-depth data science or statistical training. This endangers the final
step of analysis where an actionable and clinically meaningful interpretation
is needed.This study investigates approaches to perform patient stratification
analysis at scale using large EHR datasets and multiple clustering methods for
clinical research. We have developed several tools to facilitate the clinical
evaluation and interpretation of unsupervised patient stratification results,
namely pattern screening, meta clustering, surrogate modeling, and curation.
These tools can be used at different stages within the analysis. As compared to
a standard analysis approach, we demonstrate the ability to condense results
and optimize analysis time. In the case of meta clustering, we demonstrate that
the number of patient clusters can be reduced from 72 to 3 in one example. In
another stratification result, by using surrogate models, we could quickly
identify that heart failure patients were stratified if blood sodium
measurements were available. As this is a routine measurement performed for all
patients with heart failure, this indicated a data bias. By using further
cohort and feature curation, these patients and other irrelevant features could
be removed to increase the clinical meaningfulness. These examples show the
effectiveness of the proposed methods and we hope to encourage further research
in this field.
Related papers
- AI Framework for Early Diagnosis of Coronary Artery Disease: An
Integration of Borderline SMOTE, Autoencoders and Convolutional Neural
Networks Approach [0.44998333629984877]
We develop a methodology for balancing and augmenting data for more accurate prediction when the data is imbalanced and the sample size is small.
The experimental results revealed that the average accuracy of our proposed method for CAD prediction was 95.36, and was higher than random forest (RF), decision tree (DT), support vector machine (SVM), logistic regression (LR), and artificial neural network (ANN)
arXiv Detail & Related papers (2023-08-29T14:33:38Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Enhancing Causal Estimation through Unlabeled Offline Data [7.305019142196583]
We wish to assess relevant unmeasured physiological variables that have a strong effect on the patients diagnosis and treatment.
Extensive offline information is available about previous patients, that may only be partially related to the present patient.
Our proposed approach consists of three stages: (i) Use the abundant offline data in order to create both a non-causal and a causal estimator.
We demonstrate the effectiveness of this methodology on a (non-medical) real-world task, in situations where the offline data is only partially related to the new observations.
arXiv Detail & Related papers (2022-02-16T07:02:42Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - Mixture Model Framework for Traumatic Brain Injury Prognosis Using
Heterogeneous Clinical and Outcome Data [3.7363119896212478]
We develop a method for modeling large heterogeneous data types relevant to TBI.
The model is trained on a dataset encompassing a variety of data types, including demographics, blood-based biomarkers, and imaging findings.
It is used to stratify patients into distinct groups in an unsupervised learning setting.
arXiv Detail & Related papers (2020-12-22T19:31:03Z) - Longitudinal modeling of MS patient trajectories improves predictions of
disability progression [2.117653457384462]
This work addresses the task of optimally extracting information from longitudinal patient data in the real-world setting.
We show that with machine learning methods suited for patient trajectories modeling, we can predict disability progression of patients in a two-year horizon.
Compared to the models available in the literature, this work uses the most complete patient history for MS disease progression prediction.
arXiv Detail & Related papers (2020-11-09T20:48:00Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Segmentation analysis and the recovery of queuing parameters via the
Wasserstein distance: a study of administrative data for patients with
chronic obstructive pulmonary disease [0.0]
This work uses a data-driven approach to analyse how the resource requirements of patients with chronic obstructive pulmonary disease (COPD) may change.
It is composed of a novel combination of often distinct modes of analysis: segmentation, operational queuing theory, and the recovery of parameters from incomplete data.
arXiv Detail & Related papers (2020-08-10T17:47:34Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.