Explainable AI for Classifying UTI Risk Groups Using a Real-World Linked EHR and Pathology Lab Dataset
- URL: http://arxiv.org/abs/2411.17645v2
- Date: Mon, 13 Jan 2025 16:01:14 GMT
- Title: Explainable AI for Classifying UTI Risk Groups Using a Real-World Linked EHR and Pathology Lab Dataset
- Authors: Yujie Dai, Brian Sullivan, Axel Montout, Amy Dillon, Chris Waller, Peter Acs, Rachel Denholm, Philip Williams, Alastair D Hay, Raul Santos-Rodriguez, Andrew Dowsey,
- Abstract summary: We leverage a linked EHR dataset to characterize urinary tract infections (UTIs)
We introduce a UTI risk estimation framework informed by clinical expertise to estimate UTI risk across individual patient timelines.
Our findings reveal differences in clinical and demographic predictors across risk groups.
- Score: 0.47517735516852333
- License:
- Abstract: The use of machine learning and AI on electronic health records (EHRs) holds substantial potential for clinical insight. However, this approach faces challenges due to data heterogeneity, sparsity, temporal misalignment, and limited labeled outcomes. In this context, we leverage a linked EHR dataset of approximately one million de-identified individuals from Bristol, North Somerset, and South Gloucestershire, UK, to characterize urinary tract infections (UTIs). We implemented a data pre-processing and curation pipeline that transforms the raw EHR data into a structured format suitable for developing predictive models focused on data fairness, accountability and transparency. Given the limited availability and biases of ground truth UTI outcomes, we introduce a UTI risk estimation framework informed by clinical expertise to estimate UTI risk across individual patient timelines. Pairwise XGBoost models are trained using this framework to differentiate UTI risk categories with explainable AI techniques applied to identify key predictors and support interpretability. Our findings reveal differences in clinical and demographic predictors across risk groups. While this study highlights the potential of AI-driven insights to support UTI clinical decision-making, further investigation of patient sub-strata and extensive validation are needed to ensure robustness and applicability in clinical practice.
Related papers
- Foundation Model of Electronic Medical Records for Adaptive Risk Estimation [6.248030496243407]
Enhanced Transformer for Health Simulation (ETHOS) tokenizes patient timelines (PHTs) from EHRs.
The Risk Estimation System (ARES) employs ETHOS to compute dynamic and personalized risk probabilities for clinician-defined critical events.
ARES incorporates a personalized explainability module that identifies key clinical factors influencing risk estimates for individual patients.
arXiv Detail & Related papers (2025-02-10T03:22:39Z) - TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [57.067409211231244]
This paper presents meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design.
We provide basic validation methods for each task to ensure the datasets' usability and reliability.
We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z) - Enabling Collaborative Clinical Diagnosis of Infectious Keratitis by
Integrating Expert Knowledge and Interpretable Data-driven Intelligence [28.144658552047975]
This study investigates the performance, interpretability, and clinical utility of knowledge-guided diagnosis model (KGDM) in the diagnosis of infectious keratitis (IK)
The diagnostic odds ratios (DOR) of the interpreted AI-based biomarkers are effective, ranging from 3.011 to 35.233.
The participants with collaboration achieved a performance exceeding that of both humans and AI.
arXiv Detail & Related papers (2024-01-14T02:10:54Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Informing clinical assessment by contextualizing post-hoc explanations
of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state.
We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability.
Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z) - Integrated Convolutional and Recurrent Neural Networks for Health Risk
Prediction using Patient Journey Data with Many Missing Values [9.418011774179794]
This paper proposes a novel end-to-end approach to modeling EHR patient journey data with Integrated Convolutional and Recurrent Neural Networks.
Our model can capture both long- and short-term temporal patterns within each patient journey and effectively handle the high degree of missingness in EHR data without any imputation data generation.
arXiv Detail & Related papers (2022-11-11T07:36:18Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Deep Representation for Connected Health: Semi-supervised Learning for
Analysing the Risk of Urinary Tract Infections in People with Dementia [2.66008303934728]
This work presents a semi-supervised model that can learn from routinely collected in-home observation and measurement data.
We show how our model can process highly imbalanced and dynamic data to make robust predictions in analysing the risk of Urinary Tract Infections (UTIs) in dementia.
arXiv Detail & Related papers (2020-11-27T18:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.