Anatomical Attention Alignment representation for Radiology Report Generation
- URL: http://arxiv.org/abs/2505.07689v1
- Date: Mon, 12 May 2025 15:54:50 GMT
- Title: Anatomical Attention Alignment representation for Radiology Report Generation
- Authors: Quang Vinh Nguyen, Minh Duc Nguyen, Thanh Hoang Son Vo, Hyung-Jeong Yang, Soo-Hyung Kim,
- Abstract summary: Anatomical Attention Alignment Network (A3Net) is a framework that enhance visual-textual understanding by constructing hyper-visual representations.<n>Our approach integrates a knowledge dictionary of anatomical structures with patch-level visual features, enabling the model to effectively associate image regions with their corresponding anatomical entities.<n> Experimental results on IU X-Ray and MIMIC-CXR datasets demonstrate that A3Net significantly improves both visual perception and text generation quality.
- Score: 10.887845017050477
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automated Radiology report generation (RRG) aims at producing detailed descriptions of medical images, reducing radiologists' workload and improving access to high-quality diagnostic services. Existing encoder-decoder models only rely on visual features extracted from raw input images, which can limit the understanding of spatial structures and semantic relationships, often resulting in suboptimal text generation. To address this, we propose Anatomical Attention Alignment Network (A3Net), a framework that enhance visual-textual understanding by constructing hyper-visual representations. Our approach integrates a knowledge dictionary of anatomical structures with patch-level visual features, enabling the model to effectively associate image regions with their corresponding anatomical entities. This structured representation improves semantic reasoning, interpretability, and cross-modal alignment, ultimately enhancing the accuracy and clinical relevance of generated reports. Experimental results on IU X-Ray and MIMIC-CXR datasets demonstrate that A3Net significantly improves both visual perception and text generation quality. Our code is available at \href{https://github.com/Vinh-AI/A3Net}{GitHub}.
Related papers
- Text-driven Multiplanar Visual Interaction for Semi-supervised Medical Image Segmentation [48.76848912120607]
Semi-supervised medical image segmentation is a crucial technique for alleviating the high cost of data annotation.<n>We propose a novel text-driven multiplanar visual interaction framework for semi-supervised medical image segmentation (termed Text-SemiSeg)<n>Our framework consists of three main modules: Text-enhanced Multiplanar Representation (TMR), Category-aware Semantic Alignment (CSA), and Dynamic Cognitive Augmentation (DCA)
arXiv Detail & Related papers (2025-07-16T16:29:30Z) - Learning Generalized Medical Image Representations through Image-Graph Contrastive Pretraining [11.520404630575749]
We develop an Image-Graph Contrastive Learning framework that pairs chest X-rays with structured report knowledge graphs automatically extracted from radiology notes.
Our approach uniquely encodes the disconnected graph components via a relational graph convolution network and transformer attention.
arXiv Detail & Related papers (2024-05-15T12:27:38Z) - Language Guided Domain Generalized Medical Image Segmentation [68.93124785575739]
Single source domain generalization holds promise for more reliable and consistent image segmentation across real-world clinical settings.
We propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features.
Our approach achieves favorable performance against existing methods in literature.
arXiv Detail & Related papers (2024-04-01T17:48:15Z) - Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images.
Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z) - Anatomical Structure-Guided Medical Vision-Language Pre-training [21.68719061251635]
We propose an Anatomical Structure-Guided (ASG) framework for learning medical visual representations.
For anatomical region, we design an automatic anatomical region-sentence alignment paradigm in collaboration with radiologists.
For finding and existence, we regard them as image tags, applying an image-tag recognition decoder to associate image features with their respective tags within each sample.
arXiv Detail & Related papers (2024-03-14T11:29:47Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Finding-Aware Anatomical Tokens for Chest X-Ray Automated Reporting [13.151444796296868]
We introduce a novel adaptation of Faster R-CNN in which finding detection is performed for the candidate bounding boxes extracted during anatomical structure localisation.
We use the resulting bounding box feature representations as our set of finding-aware anatomical tokens.
We show that task-aware anatomical tokens give state-of-the-art performance when integrated into an automated reporting pipeline.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report
Generation [92.73584302508907]
We propose a knowledge graph with Dynamic structure and nodes to facilitate medical report generation with Contrastive Learning.
In detail, the fundamental structure of our graph is pre-constructed from general knowledge.
Each image feature is integrated with its very own updated graph before being fed into the decoder module for report generation.
arXiv Detail & Related papers (2023-03-18T03:53:43Z) - Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance.
Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas.
We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - Cross-modal Memory Networks for Radiology Report Generation [30.13916304931662]
Cross-modal memory networks (CMN) are proposed to enhance the encoder-decoder framework for radiology report generation.
Our model is able to better align information from radiology images and texts so as to help generating more accurate reports in terms of clinical indicators.
arXiv Detail & Related papers (2022-04-28T02:32:53Z) - Structured Landmark Detection via Topology-Adapting Deep Graph Learning [75.20602712947016]
We present a new topology-adapting deep graph learning approach for accurate anatomical facial and medical landmark detection.
The proposed method constructs graph signals leveraging both local image features and global shape features.
Experiments are conducted on three public facial image datasets (WFLW, 300W, and COFW-68) as well as three real-world X-ray medical datasets (Cephalometric (public), Hand and Pelvis)
arXiv Detail & Related papers (2020-04-17T11:55:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.