Related papers: Enhancing Multi-Label Thoracic Disease Diagnosis with Deep Ensemble-Based Uncertainty Quantification

Enhancing Multi-Label Thoracic Disease Diagnosis with Deep Ensemble-Based Uncertainty Quantification

URL: http://arxiv.org/abs/2511.18839v1
Date: Mon, 24 Nov 2025 07:20:40 GMT
Title: Enhancing Multi-Label Thoracic Disease Diagnosis with Deep Ensemble-Based Uncertainty Quantification
Authors: Yasiru Laksara, Uthayasanker Thayasivam,
Abstract summary: This project integrates robust Uncertainty Quantification (UQ) into a high performance diagnostic platform for 14 common thoracic diseases on the NIH ChestX-ray14 dataset.<n>Initial architectural development failed to stabilize performance and calibration using Monte Carlo Dropout (MCD), yielding an unacceptable Expected Error (ECE) of 0.7588.<n>This resulting Deep Ensemble (DE) successfully stabilized performance and delivered superior reliability, achieving a State-of-the-Art (SOTA) average Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.8559 and an average F1 Score of 0.3857.
Score: 1.2461503242570642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The utility of deep learning models, such as CheXNet, in high stakes clinical settings is fundamentally constrained by their purely deterministic nature, failing to provide reliable measures of predictive confidence. This project addresses this critical gap by integrating robust Uncertainty Quantification (UQ) into a high performance diagnostic platform for 14 common thoracic diseases on the NIH ChestX-ray14 dataset. Initial architectural development failed to stabilize performance and calibration using Monte Carlo Dropout (MCD), yielding an unacceptable Expected Calibration Error (ECE) of 0.7588. This technical failure necessitated a rigorous architectural pivot to a high diversity, 9-member Deep Ensemble (DE). This resulting DE successfully stabilized performance and delivered superior reliability, achieving a State-of-the-Art (SOTA) average Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.8559 and an average F1 Score of 0.3857. Crucially, the DE demonstrated superior calibration (Mean ECE of 0.0728 and Negative Log-Likelihood (NLL) of 0.1916) and enabled the reliable decomposition of total uncertainty into its Aleatoric (irreducible data noise) and Epistemic (reducible model knowledge) components, with a mean Epistemic Uncertainty (EU) of 0.0240. These results establish the Deep Ensemble as a trustworthy and explainable platform, transforming the model from a probabilistic tool into a reliable clinical decision support system.

Related papers

Suppressing Prior-Comparison Hallucinations in Radiology Report Generation via Semantically Decoupled Latent Steering [94.37535002230504]
We develop a training-free, inference-time control framework termed Semantically Decoupled Latent Steering.<n>Our approach constructs a semantic-free intervention vector via large language model (LLM)-driven semantic decomposition.<n>We show that our approach significantly reduces the probability of historical hallucinations.
arXiv Detail & Related papers (2026-02-27T04:49:01Z)
Diagnostics for Individual-Level Prediction Instability in Machine Learning for Healthcare [0.0]
We propose an evaluation framework that quantifies individual-level prediction instability by using two complementary diagnostics.<n>We apply these diagnostics to simulated data and GUSTO-I clinical dataset.
arXiv Detail & Related papers (2026-02-27T03:42:28Z)
QuanvNeXt: An end-to-end quanvolutional neural network for EEG-based detection of major depressive disorder [2.819628349300034]
QuanvNeXt is an end-to-end fully quanvolutional model for EEG-based depression diagnosis.<n>It achieves an average accuracy of 93.1% and an average AUC-ROC of 97.2%, outperforming state-of-the-art baselines.
arXiv Detail & Related papers (2025-12-10T10:44:47Z)
CONFIDE: Hallucination Assessment for Reliable Biomolecular Structure Prediction and Design [46.12506067241116]
We present CODE (Chain of Diffusion Embeddings), a self evaluating metric to quantify topological frustration.<n>We propose CONFIDE, a unified evaluation framework that combines energetic and topological perspectives.<n>By combining data driven embeddings with theoretical insight, CODE and CONFIDE outperform existing metrics across a wide range of biomolecular systems.
arXiv Detail & Related papers (2025-11-20T03:38:46Z)
Diagnosing Hallucination Risk in AI Surgical Decision-Support: A Sequential Framework for Sequential Validation [5.469454486414467]
Large language models (LLMs) offer transformative potential for clinical decision support in spine surgery.<n>LLMs pose significant risks through hallucinations, which are factually inconsistent or contextually misaligned outputs.<n>This study introduces a clinician-centered framework to quantify hallucination risks by evaluating diagnostic precision, recommendation quality, reasoning robustness, output coherence, and knowledge alignment.
arXiv Detail & Related papers (2025-11-01T15:25:55Z)
Efficient Epistemic Uncertainty Estimation in Cerebrovascular Segmentation [1.3980986259786223]
We introduce an efficient ensemble model combining the advantages of Bayesian Approximation and Deep Ensembles.<n>Areas of high model uncertainty and erroneous predictions are aligned which demonstrates the effectiveness and reliability of the approach.
arXiv Detail & Related papers (2025-03-28T09:39:37Z)
CRTRE: Causal Rule Generation with Target Trial Emulation Framework [47.2836994469923]
We introduce a novel method called causal rule generation with target trial emulation framework (CRTRE) CRTRE applies randomize trial design principles to estimate the causal effect of association rules. We then incorporate such association rules for the downstream applications such as prediction of disease onsets.
arXiv Detail & Related papers (2024-11-10T02:40:06Z)
Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative. We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z)
Towards Reliable Medical Image Segmentation by Modeling Evidential Calibrated Uncertainty [57.023423137202485]
Concerns regarding the reliability of medical image segmentation persist among clinicians.<n>We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.<n>By leveraging subjective logic theory, we explicitly model probability and uncertainty for medical image segmentation.
arXiv Detail & Related papers (2023-01-01T05:02:46Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.