Radiology Report Generation with Layer-Wise Anatomical Attention
- URL: http://arxiv.org/abs/2512.16841v1
- Date: Thu, 18 Dec 2025 18:17:57 GMT
- Title: Radiology Report Generation with Layer-Wise Anatomical Attention
- Authors: Emmanuel D. Muñiz-De-León, Jorge A. Rosales-de-Golferichs, Ana S. Muñoz-Rodríguez, Alejandro I. Trejo-Castro, Eduardo de Avila-Armenta, Antonio Martínez-Torteya,
- Abstract summary: We introduce a compact image-to-text architecture that generates Findings section of chest X-ray reports.<n>The model combines a frozen Self-Distillation with No Labels v3 (DINOv3) Vision Transformer (ViT) encoder with a Generative Pre-trained Transformer 2 (GPT-2) decoder.
- Score: 35.18016233072556
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic radiology report generation is a promising application of multimodal deep learning, aiming to reduce reporting workload and improve consistency. However, current state-of-the-art (SOTA) systems - such as Multimodal AI for Radiology Applications (MAIRA-2) and Medical Pathways Language Model-Multimodal (MedPaLM-M) - depend on large-scale multimodal training, clinical metadata, and multiple imaging views, making them resource-intensive and inaccessible for most settings. We introduce a compact image-to-text architecture that generates the Findings section of chest X-ray reports from a single frontal image. The model combines a frozen Self-Distillation with No Labels v3 (DINOv3) Vision Transformer (ViT) encoder with a Generative Pre-trained Transformer 2 (GPT-2) decoder enhanced by layer-wise anatomical attention. This mechanism integrates lung and heart segmentation masks through hierarchical Gaussian smoothing, biasing attention toward clinically relevant regions without adding trainable parameters. Evaluated on the official Medical Information Mart for Intensive Care-Chest X-ray (MIMIC-CXR) dataset using Chest Radiograph Expert (CheXpert) and Radiology Graph (RadGraph) metrics, our approach achieved substantial gains: CheXpert Macro-F1 for five key pathologies increased by 168% (0.083 -> 0.238) and Micro-F1 by 146% (0.137 -> 0.337), while broader performance across 14 observations improved by 86% (0.170 -> 0.316). Structural coherence also improved, with RadGraph F1 rising by 9.7%. Despite its small size and purely image-conditioned design, the model demonstrates that decoder-level anatomical guidance improves spatial grounding and enhances coherence in clinically relevant regions. The source code is publicly available at: https://github.com/devMuniz02/UDEM-CXR-Reporting-Thesis-2025.
Related papers
- A DeepSeek-Powered AI System for Automated Chest Radiograph Interpretation in Clinical Practice [83.11942224668127]
Janus-Pro-CXR (1B) is a chest X-ray interpretation system based on DeepSeek Janus-Pro model.<n>Our system outperforms state-of-the-art X-ray report generation models in automated report generation.
arXiv Detail & Related papers (2025-12-23T13:26:13Z) - CT-GRAPH: Hierarchical Graph Attention Network for Anatomy-Guided CT Report Generation [4.376648893167674]
We propose CT-GRAPH, a hierarchical graph attention network that explicitly models radiological knowledge.<n>Our method leverages pretrained 3D medical feature encoders to obtain global and organ-level features.<n>We show that our method achieves a substantial improvement of absolute 7.9% in F1 score over current state-of-the-art methods.
arXiv Detail & Related papers (2025-08-07T13:18:03Z) - X-GRM: Large Gaussian Reconstruction Model for Sparse-view X-rays to Computed Tomography [89.84588038174721]
Computed Tomography serves as an indispensable tool in clinical, providing non-invasive visualization of internal anatomical structures.<n>Existing CT reconstruction works are limited to small-capacity model architecture and inflexible volume representation.<n>We present X-GRM, a large feedforward model for reconstructing 3D CT volumes from sparse-view 2D X-ray projections.
arXiv Detail & Related papers (2025-05-21T08:14:10Z) - Beyond Conventional Transformers: The Medical X-ray Attention (MXA) Block for Improved Multi-Label Diagnosis Using Knowledge Distillation [0.0]
We present the Medical X-ray Attention (MXA) block, a novel attention mechanism tailored specifically to address the challenges of X-ray abnormality detection.<n>Our approach achieves an area under the curve (AUC) of 0.85, an absolute improvement of 0.19 compared to our baseline model's AUC of 0.66.
arXiv Detail & Related papers (2025-04-03T04:55:42Z) - X-LRM: X-ray Large Reconstruction Model for Extremely Sparse-View Computed Tomography Recovery in One Second [52.11676689269379]
Sparse-view 3D CT reconstruction aims to recover structures from a limited number of 2D X-ray projections.<n>Existing feedforward methods are constrained by the limited capacity of CNN-based architectures and the scarcity of large-scale training datasets.<n>We propose an X-ray Large Reconstruction Model (X-LRM) for extremely sparse-view (10 views) CT reconstruction.
arXiv Detail & Related papers (2025-03-09T01:39:59Z) - Complex Organ Mask Guided Radiology Report Generation [13.96983438709763]
We propose the Complex Organ Mask Guided (termed as COMG) report generation model.
We leverage prior knowledge of the disease corresponding to each organ in the fusion process to enhance the disease identification phase.
Results on two public datasets show that COMG achieves a 11.4% and 9.7% improvement in terms of BLEU@4 scores over the SOTA model KiUT.
arXiv Detail & Related papers (2023-11-04T05:34:24Z) - 3D Medical Image Segmentation based on multi-scale MPU-Net [5.393743755706745]
This paper proposes a tumor segmentation model MPU-Net for patient volume CT images.
It is inspired by Transformer with a global attention mechanism.
Compared with the benchmark model U-Net, MPU-Net shows excellent segmentation results.
arXiv Detail & Related papers (2023-07-11T20:46:19Z) - Self adaptive global-local feature enhancement for radiology report
generation [10.958641951927817]
We propose a novel framework AGFNet to dynamically fuse the global and anatomy region feature to generate multi-grained radiology report.
Firstly, we extract important anatomy region features and global features of input Chest X-ray (CXR)
Then, with the region features and the global features as input, our proposed self-adaptive fusion gate module could dynamically fuse multi-granularity information.
Finally, the captioning generator generates the radiology reports through multi-granularity features.
arXiv Detail & Related papers (2022-11-21T11:50:42Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Radiomics-Guided Global-Local Transformer for Weakly Supervised
Pathology Localization in Chest X-Rays [65.88435151891369]
Radiomics-Guided Transformer (RGT) fuses textitglobal image information with textitlocal knowledge-guided radiomics information.
RGT consists of an image Transformer branch, a radiomics Transformer branch, and fusion layers that aggregate image and radiomic information.
arXiv Detail & Related papers (2022-07-10T06:32:56Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.