Related papers: Beyond Conventional Transformers: The Medical X-ray Attention (MXA) Block for Improved Multi-Label Diagnosis Using Knowledge Distillation

Beyond Conventional Transformers: The Medical X-ray Attention (MXA) Block for Improved Multi-Label Diagnosis Using Knowledge Distillation

URL: http://arxiv.org/abs/2504.02277v2
Date: Sun, 18 May 2025 05:07:13 GMT
Title: Beyond Conventional Transformers: The Medical X-ray Attention (MXA) Block for Improved Multi-Label Diagnosis Using Knowledge Distillation
Authors: Amit Rand, Hadi Ibrahim,
Abstract summary: We present the Medical X-ray Attention (MXA) block, a novel attention mechanism tailored specifically to address the challenges of X-ray abnormality detection.<n>Our approach achieves an area under the curve (AUC) of 0.85, an absolute improvement of 0.19 compared to our baseline model's AUC of 0.66.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Medical imaging, particularly X-ray analysis, often involves detecting multiple conditions simultaneously within a single scan, making multi-label classification crucial for real-world clinical applications. We present the Medical X-ray Attention (MXA) block, a novel attention mechanism tailored specifically to address the unique challenges of X-ray abnormality detection. The MXA block enhances traditional Multi-Head Self Attention (MHSA) by integrating a specialized module that efficiently captures both detailed local information and broader global context. To the best of our knowledge, this is the first work to propose a task-specific attention mechanism for diagnosing chest X-rays, as well as to attempt multi-label classification using an Efficient Vision Transformer (EfficientViT). By embedding the MXA block within the EfficientViT architecture and employing knowledge distillation, our proposed model significantly improves performance on the CheXpert dataset, a widely used benchmark for multi-label chest X-ray abnormality detection. Our approach achieves an area under the curve (AUC) of 0.85, an absolute improvement of 0.19 compared to our baseline model's AUC of 0.66, corresponding to a substantial approximate 233% relative improvement over random guessing (AUC = 0.5).

Related papers

A DeepSeek-Powered AI System for Automated Chest Radiograph Interpretation in Clinical Practice [83.11942224668127]
Janus-Pro-CXR (1B) is a chest X-ray interpretation system based on DeepSeek Janus-Pro model.<n>Our system outperforms state-of-the-art X-ray report generation models in automated report generation.
arXiv Detail & Related papers (2025-12-23T13:26:13Z)
Radiology Report Generation with Layer-Wise Anatomical Attention [35.18016233072556]
We introduce a compact image-to-text architecture that generates Findings section of chest X-ray reports.<n>The model combines a frozen Self-Distillation with No Labels v3 (DINOv3) Vision Transformer (ViT) encoder with a Generative Pre-trained Transformer 2 (GPT-2) decoder.
arXiv Detail & Related papers (2025-12-18T18:17:57Z)
Multi-pathology Chest X-ray Classification with Rejection Mechanisms [36.0596663889937]
Overconfidence in deep learning models poses a significant risk in high-stakes medical imaging tasks.<n>This study introduces an uncertainty-aware framework for chest X-ray diagnosis based on a DenseNet-121 backbone.
arXiv Detail & Related papers (2025-09-12T15:36:26Z)
A Deep Learning-Based Ensemble System for Automated Shoulder Fracture Detection in Clinical Radiographs [0.0]
Shoulder fractures are often underdiagnosed, especially in emergency and high-volume clinical settings.<n>We developed a multi-model deep learning system using 10,000 annotated shoulder X-rays.<n>The ensemble-based AI can reliably detect shoulder fractures in radiographs with high clinical relevance.
arXiv Detail & Related papers (2025-07-17T06:06:12Z)
RadFabric: Agentic AI System with Reasoning Capability for Radiology [61.25593938175618]
RadFabric is a multi agent, multimodal reasoning framework that unifies visual and textual analysis for comprehensive CXR interpretation.<n>System employs specialized CXR agents for pathology detection, an Anatomical Interpretation Agent to map visual findings to precise anatomical structures, and a Reasoning Agent powered by large multimodal reasoning models to synthesize visual, anatomical, and clinical data into transparent and evidence based diagnoses.
arXiv Detail & Related papers (2025-06-17T03:10:33Z)
Harnessing EHRs for Diffusion-based Anomaly Detection on Chest X-rays [10.062242117926177]
Unsupervised anomaly detection (UAD) in medical imaging is crucial for identifying pathological abnormalities without requiring extensive labeled data.<n>We propose Diff3M, a multi-modal diffusion-based framework that integrates chest X-rays and structured Electronic Health Records for enhanced anomaly detection.
arXiv Detail & Related papers (2025-05-22T22:02:47Z)
Advancing Chronic Tuberculosis Diagnostics Using Vision-Language Models: A Multi modal Framework for Precision Analysis [0.0]
This study proposes a Vision-Language Model (VLM) to enhance automated chronic tuberculosis (TB) screening.<n>By integrating chest X-ray images with clinical data, the model addresses the challenges of manual interpretation.<n>The model demonstrated high precision (94 percent) and recall (94 percent) for detecting key chronic TB pathologies.
arXiv Detail & Related papers (2025-03-17T13:49:29Z)
VerteNet -- A Multi-Context Hybrid CNN Transformer for Accurate Vertebral Landmark Localization in Lateral Spine DXA Images [12.240318467857906]
VerteNet is a hybrid CNN-Transformer model featuring a novel dual-resolution attention mechanism in self and cross-attention domains.<n>We train VerteNet on 620 DXA LSIs from various machines and achieve superior results compared to existing methods.
arXiv Detail & Related papers (2025-02-04T08:27:51Z)
MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement [1.6355783973385114]
Multi-view perception knowledge-enhanced TansfoRmer (MvKeTR)<n>MVPA with view-aware attention is proposed to synthesize diagnostic information from multiple anatomical views effectively.<n>Cross-Modal Knowledge Enhancer (CMKE) is devised to retrieve the most similar reports based on the query volume.
arXiv Detail & Related papers (2024-11-27T12:58:23Z)
A foundation model for generalizable disease diagnosis in chest X-ray images [40.9095393430871]
We introduce CXRBase, a foundational model designed to learn versatile representations from unlabelled CXR images. CXRBase is trained on a substantial dataset of 1.04 million unlabelled CXR images. It is fine-tuned with labeled data to enhance its performance in disease detection.
arXiv Detail & Related papers (2024-10-11T14:41:27Z)
Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning [46.75992018094998]
This research addresses the challenges of diagnosing chest X-rays (CXRs) at low resolutions. High-resolution CXR imaging is crucial for identifying small but critical anomalies, such as nodules or opacities. This paper presents the Multilevel Collaborative Attention Knowledge (MLCAK) method.
arXiv Detail & Related papers (2024-05-22T06:10:54Z)
Domain Transfer Through Image-to-Image Translation for Uncertainty-Aware Prostate Cancer Classification [42.75911994044675]
We present a novel approach for unpaired image-to-image translation of prostate MRIs and an uncertainty-aware training approach for classifying clinically significant PCa. Our approach involves a novel pipeline for translating unpaired 3.0T multi-parametric prostate MRIs to 1.5T, thereby augmenting the available training data. Our experiments demonstrate that the proposed method significantly improves the Area Under ROC Curve (AUC) by over 20% compared to the previous work.
arXiv Detail & Related papers (2023-07-02T05:26:54Z)
Multi-Scale Feature Fusion using Parallel-Attention Block for COVID-19 Chest X-ray Diagnosis [2.15242029196761]
Under the global COVID-19 crisis, accurate diagnosis of COVID-19 from Chest X-ray (CXR) images is critical. We propose a novel multi-feature fusion network using parallel attention blocks to fuse the original CXR images and local-phase feature-enhanced CXR images at multi-scales.
arXiv Detail & Related papers (2023-04-25T16:56:12Z)
Preservation of High Frequency Content for Deep Learning-Based Medical Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists. We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z)
Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning. We generate a corresponding radiology image in a target domain while preserving the identity of the patient. We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z)
Contrastive Attention for Automatic Chest X-ray Report Generation [124.60087367316531]
In most cases, the normal regions dominate the entire chest X-ray image, and the corresponding descriptions of these normal regions dominate the final report. We propose Contrastive Attention (CA) model, which compares the current input image with normal images to distill the contrastive information. We achieve the state-of-the-art results on the two public datasets.
arXiv Detail & Related papers (2021-06-13T11:20:31Z)
Cross-Modal Contrastive Learning for Abnormality Classification and Localization in Chest X-rays with Radiomics using a Feedback Loop [63.81818077092879]
We propose an end-to-end semi-supervised cross-modal contrastive learning framework for medical images. We first apply an image encoder to classify the chest X-rays and to generate the image features. The radiomic features are then passed through another dedicated encoder to act as the positive sample for the image features generated from the same chest X-ray.
arXiv Detail & Related papers (2021-04-11T09:16:29Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.