Related papers: Context-Aware Asymmetric Ensembling for Interpretable Retinopathy of Prematurity Screening via Active Query and Vascular Attention

Context-Aware Asymmetric Ensembling for Interpretable Retinopathy of Prematurity Screening via Active Query and Vascular Attention

URL: http://arxiv.org/abs/2602.05208v1
Date: Thu, 05 Feb 2026 02:06:26 GMT
Title: Context-Aware Asymmetric Ensembling for Interpretable Retinopathy of Prematurity Screening via Active Query and Vascular Attention
Authors: Md. Mehedi Hassan, Taufiq Hasan,
Abstract summary: Retinopathy of Prematurity (ROP) is among the major causes of preventable childhood blindness.<n>Current deep learning models depend heavily on large private datasets and passive multimodal fusion.<n>We propose the Context-Aware Asymmetric Ensemble Model (CAA Ensemble) that simulates clinical reasoning through two specialized streams.
Score: 1.8420107091891775
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retinopathy of Prematurity (ROP) is among the major causes of preventable childhood blindness. Automated screening remains challenging, primarily due to limited data availability and the complex condition involving both structural staging and microvascular abnormalities. Current deep learning models depend heavily on large private datasets and passive multimodal fusion, which commonly fail to generalize on small, imbalanced public cohorts. We thus propose the Context-Aware Asymmetric Ensemble Model (CAA Ensemble) that simulates clinical reasoning through two specialized streams. First, the Multi-Scale Active Query Network (MS-AQNet) serves as a structure specialist, utilizing clinical contexts as dynamic query vectors to spatially control visual feature extraction for localization of the fibrovascular ridge. Secondly, VascuMIL encodes Vascular Topology Maps (VMAP) within a gated Multiple Instance Learning (MIL) network to precisely identify vascular tortuosity. A synergistic meta-learner ensembles these orthogonal signals to resolve diagnostic discordance across multiple objectives. Tested on a highly imbalanced cohort of 188 infants (6,004 images), the framework attained State-of-the-Art performance on two distinct clinical tasks: achieving a Macro F1-Score of 0.93 for Broad ROP staging and an AUC of 0.996 for Plus Disease detection. Crucially, the system features `Glass Box' transparency through counterfactual attention heatmaps and vascular threat maps, proving that clinical metadata dictates the model's visual search. Additionally, this study demonstrates that architectural inductive bias can serve as an effective bridge for the medical AI data gap.

Related papers

Multi-View Stenosis Classification Leveraging Transformer-Based Multiple-Instance Learning Using Real-World Clinical Data [76.89269238957593]
Coronary artery stenosis is a leading cause of cardiovascular disease, diagnosed by analyzing the coronary arteries from multiple angiography views.<n>We propose SegmentMIL, a transformer-based multi-view multiple-instance learning framework for patient-level stenosis classification.
arXiv Detail & Related papers (2026-02-02T13:07:52Z)
MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization [46.65200216642429]
We introduce MedAD-38K, the first large-scale, multi-modal, and multi-center benchmark for MedAD featuring diagnostic Chain-of-Thought (CoT) annotations alongside structured Visual Question-Answering (VQA) pairs.<n>Our proposed model, MedAD-R1, achieves state-of-the-art (SOTA) performance on the MedAD-38K benchmark, outperforming strong baselines by more than 10%.
arXiv Detail & Related papers (2026-02-01T07:56:10Z)
MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement [63.82954136824963]
Medical Vision-Language Models excel at perception tasks with complex clinical reasoning required in real-world scenarios.<n>We propose a novel reasoning MedVLM that addresses these challenges through domain-specific adaptation and guideline reinforcement.
arXiv Detail & Related papers (2026-01-16T02:32:07Z)
A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
An Explainable Hybrid AI Framework for Enhanced Tuberculosis and Symptom Detection [55.35661671061754]
Tuberculosis remains a critical global health issue, particularly in resource-limited and remote areas.<n>We propose a framework which enhances disease and symptom detection on chest X-rays by integrating two supervised heads and a self-supervised head.<n>Our model achieves an accuracy of 98.85% for distinguishing between COVID-19, tuberculosis, and normal cases, and a macro-F1 score of 90.09% for multilabel symptom detection.
arXiv Detail & Related papers (2025-10-21T17:18:55Z)
A Novel Multi-branch ConvNeXt Architecture for Identifying Subtle Pathological Features in CT Scans [1.2461503242570642]
This paper introduces a novel multi-branch ConvNeXt architecture designed specifically for the nuanced challenges of medical image analysis.<n>The proposed model incorporates a rigorous end-to-end pipeline, from meticulous data preprocessing to augmentation to a disciplined two-phase training strategy.<n> Experimental results demonstrate a superior performance on the validation set, achieving a final ROC-AUC of 0.9937, a validation accuracy of 0.9757, and an F1-score of 0.9825 for COVID-19 cases.
arXiv Detail & Related papers (2025-10-10T08:00:46Z)
RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis [56.373297358647655]
Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
arXiv Detail & Related papers (2025-09-24T10:36:14Z)
OCSVM-Guided Representation Learning for Unsupervised Anomaly Detection [1.0190194769786831]
Unsupervised anomaly detection (UAD) aims to detect anomalies without labeled data.<n>We propose a novel method that tightly couples representation learning with an analytically solvable one-class SVM.<n>The model is evaluated on two tasks: a new benchmark based on MNIST-C, and a challenging brain MRI subtle lesion detection task.
arXiv Detail & Related papers (2025-07-25T13:00:40Z)
WMKA-Net: A Weighted Multi-Kernel Attention Network for Retinal Vessel Segmentation [0.48536814705421105]
This study proposes a dual-stage solution to address the issues of insufficient multi-scale feature fusion, disruption of contextual continuity, and noise interference.<n>The first stage employs a Multi-Scale Fusion Module (RMS) that uses hierarchical adaptive convolution to dynamically merge cross-scale features from capillaries to main vessels.<n>The second stage introduces a Vascular-Oriented Attention Mechanism, which models long-distance vascular continuity through an axial pathway.
arXiv Detail & Related papers (2025-04-21T06:32:25Z)
RURANET++: An Unsupervised Learning Method for Diabetic Macular Edema Based on SCSE Attention Mechanisms and Dynamic Multi-Projection Head Clustering [13.423253964156117]
RURANET++ is an unsupervised learning-based automated diagnostic system for Diabetic Macular Edema (DME)<n>During feature processing, a pre-trained GoogLeNet model extracts deep features from retinal images, followed by PCA-based dimensionality reduction to 50 dimensions for computational efficiency.<n> Experimental results demonstrate superior performance across multiple metrics, achieving maximum accuracy (0.8411), precision (0.8593), recall (0.8411), and F1-score, with exceptional clustering quality.
arXiv Detail & Related papers (2025-02-27T16:06:57Z)
Adversarial Vessel-Unveiling Semi-Supervised Segmentation for Retinopathy of Prematurity Diagnosis [9.683492465191241]
We propose a semi supervised segmentation framework designed to advance ROP studies without the need for extensive manual vessel annotation. Unlike previous methods that rely solely on limited labeled data, our approach integrates uncertainty weighted vessel unveiling module and domain adversarial learning. We validate our approach on public datasets and an in-house ROP dataset, demonstrating its superior performance across multiple evaluation metrics.
arXiv Detail & Related papers (2024-11-14T02:40:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.