STREAMLINE: An Automated Machine Learning Pipeline for Biomedicine
Applied to Examine the Utility of Photography-Based Phenotypes for OSA
Prediction Across International Sleep Centers
- URL: http://arxiv.org/abs/2312.05461v1
- Date: Sat, 9 Dec 2023 04:12:38 GMT
- Title: STREAMLINE: An Automated Machine Learning Pipeline for Biomedicine
Applied to Examine the Utility of Photography-Based Phenotypes for OSA
Prediction Across International Sleep Centers
- Authors: Ryan J. Urbanowicz, Harsh Bandhey, Brendan T. Keenan, Greg Maislin, Sy
Hwang, Danielle L. Mowery, Shannon M. Lynch, Diego R. Mazzotti, Fang Han,
Qing Yun Li, Thomas Penzel, Sergio Tufik, Lia Bittencourt, Thorarinn
Gislason, Philip de Chazal, Bhajan Singh, Nigel McArdle, Ning-Hung Chen,
Allan Pack, Richard J. Schwab, Peter A. Cistulli, Ulysses J. Magalang
- Abstract summary: We develop and validate a Simple, Transparent, End-to-end Automated Machine Learning Pipeline (STREAMLINE)
We apply STREAMLINE to investigate the added utility of photography-based phenotypes for predicting obstructive sleep apnea (OSA)
Benchmarking analyses validated the efficacy of STREAMLINE across data simulations with increasingly complex patterns of association.
- Score: 2.872498492478085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While machine learning (ML) includes a valuable array of tools for analyzing
biomedical data, significant time and expertise is required to assemble
effective, rigorous, and unbiased pipelines. Automated ML (AutoML) tools seek
to facilitate ML application by automating a subset of analysis pipeline
elements. In this study we develop and validate a Simple, Transparent,
End-to-end Automated Machine Learning Pipeline (STREAMLINE) and apply it to
investigate the added utility of photography-based phenotypes for predicting
obstructive sleep apnea (OSA); a common and underdiagnosed condition associated
with a variety of health, economic, and safety consequences. STREAMLINE is
designed to tackle biomedical binary classification tasks while adhering to
best practices and accommodating complexity, scalability, reproducibility,
customization, and model interpretation. Benchmarking analyses validated the
efficacy of STREAMLINE across data simulations with increasingly complex
patterns of association. Then we applied STREAMLINE to evaluate the utility of
demographics (DEM), self-reported comorbidities (DX), symptoms (SYM), and
photography-based craniofacial (CF) and intraoral (IO) anatomy measures in
predicting any OSA or moderate/severe OSA using 3,111 participants from Sleep
Apnea Global Interdisciplinary Consortium (SAGIC). OSA analyses identified a
significant increase in ROC-AUC when adding CF to DEM+DX+SYM to predict
moderate/severe OSA. A consistent but non-significant increase in PRC-AUC was
observed with the addition of each subsequent feature set to predict any OSA,
with CF and IO yielding minimal improvements. Application of STREAMLINE to OSA
data suggests that CF features provide additional value in predicting
moderate/severe OSA, but neither CF nor IO features meaningfully improved the
prediction of any OSA beyond established demographics, comorbidity and symptom
characteristics.
Related papers
- Machine Learning Applications in Medical Prognostics: A Comprehensive Review [0.0]
Machine learning (ML) has revolutionized medical prognostics by integrating advanced algorithms with clinical data.
RF models demonstrate robust performance in handling high-dimensional data.
CNNs have shown exceptional accuracy in cancer detection.
LSTM networks excel in analyzing temporal data, providing accurate predictions of clinical deterioration.
arXiv Detail & Related papers (2024-08-05T09:41:34Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline
Algorithm: Application to the ICU Length of Stay Prediction [65.268245109828]
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the ICU length of stay.
The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction.
The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
arXiv Detail & Related papers (2023-12-31T16:01:48Z) - VAE-IF: Deep feature extraction with averaging for fully unsupervised artifact detection in routinely acquired ICU time-series [1.9665926763554147]
We propose a novel fully unsupervised approach to detect artifacts in minute-by-minute resolution ICU data without prior labeling or signal-specific knowledge.
Our approach combines a variational autoencoder (VAE) and an isolation forest (IF) into a hybrid model to learn features and identify anomalies.
We show that our unsupervised approach achieves comparable sensitivity to fully supervised methods and generalizes well to an external dataset.
arXiv Detail & Related papers (2023-12-10T18:03:40Z) - Clairvoyance: A Pipeline Toolkit for Medical Time Series [95.22483029602921]
Time-series learning is the bread and butter of data-driven *clinical decision support*
Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a software toolkit.
Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.
arXiv Detail & Related papers (2023-10-28T12:08:03Z) - Physics Inspired Hybrid Attention for SAR Target Recognition [61.01086031364307]
We propose a physics inspired hybrid attention (PIHA) mechanism and the once-for-all (OFA) evaluation protocol to address the issues.
PIHA leverages the high-level semantics of physical information to activate and guide the feature group aware of local semantics of target.
Our method outperforms other state-of-the-art approaches in 12 test scenarios with same ASC parameters.
arXiv Detail & Related papers (2023-09-27T14:39:41Z) - QXAI: Explainable AI Framework for Quantitative Analysis in Patient
Monitoring Systems [9.29069202652354]
An Explainable AI for Quantitative analysis (QXAI) framework is proposed with post-hoc model explainability and intrinsic explainability for regression and classification tasks.
We adopted the artificial neural networks (ANN) and attention-based Bidirectional LSTM (BiLSTM) models for the prediction of heart rate and classification of physical activities based on sensor data.
arXiv Detail & Related papers (2023-09-19T03:50:30Z) - Fuzzy Attention Neural Network to Tackle Discontinuity in Airway
Segmentation [67.19443246236048]
Airway segmentation is crucial for the examination, diagnosis, and prognosis of lung diseases.
Some small-sized airway branches (e.g., bronchus and terminaloles) significantly aggravate the difficulty of automatic segmentation.
This paper presents an efficient method for airway segmentation, comprising a novel fuzzy attention neural network and a comprehensive loss function.
arXiv Detail & Related papers (2022-09-05T16:38:13Z) - DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for
AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise
Annotations [90.27736364704108]
We present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery.
DrugOOD comes with an open-source Python package that fully automates benchmarking processes.
We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction.
arXiv Detail & Related papers (2022-01-24T12:32:48Z) - Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based
Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E)
We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z) - A Rigorous Machine Learning Analysis Pipeline for Biomedical Binary
Classification: Application in Pancreatic Cancer Nested Case-control Studies
with Implications for Bias Assessments [2.9726886415710276]
We have laid out and assembled a complete, rigorous ML analysis pipeline focused on binary classification.
This 'automated' but customizable pipeline includes a) exploratory analysis, b) data cleaning and transformation, c) feature selection, d) model training with 9 established ML algorithms.
We apply this pipeline to an epidemiological investigation of established and newly identified risk factors for cancer to evaluate how different sources of bias might be handled by ML algorithms.
arXiv Detail & Related papers (2020-08-28T19:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.