AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
Tags for Medical Report Generation
- URL: http://arxiv.org/abs/2203.10095v1
- Date: Fri, 18 Mar 2022 13:43:53 GMT
- Title: AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
Tags for Medical Report Generation
- Authors: Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, Jing Zhang, Xian Wu
- Abstract summary: We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
- Score: 50.21065317817769
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, medical report generation, which aims to automatically generate a
long and coherent descriptive paragraph of a given medical image, has received
growing research interests. Different from the general image captioning tasks,
medical report generation is more challenging for data-driven neural models.
This is mainly due to 1) the serious data bias: the normal visual regions
dominate the dataset over the abnormal visual regions, and 2) the very long
sequence. To alleviate above two problems, we propose an AlignTransformer
framework, which includes the Align Hierarchical Attention (AHA) and the
Multi-Grained Transformer (MGT) modules: 1) AHA module first predicts the
disease tags from the input image and then learns the multi-grained visual
features by hierarchically aligning the visual regions and disease tags. The
acquired disease-grounded visual features can better represent the abnormal
regions of the input image, which could alleviate data bias problem; 2) MGT
module effectively uses the multi-grained features and Transformer framework to
generate the long medical report. The experiments on the public IU-Xray and
MIMIC-CXR datasets show that the AlignTransformer can achieve results
competitive with state-of-the-art methods on the two datasets. Moreover, the
human evaluation conducted by professional radiologists further proves the
effectiveness of our approach.
Related papers
- Advancing Medical Image Segmentation: Morphology-Driven Learning with Diffusion Transformer [4.672688418357066]
We propose a novel Transformer Diffusion (DTS) model for robust segmentation in the presence of noise.
Our model, which analyzes the morphological representation of images, shows better results than the previous models in various medical imaging modalities.
arXiv Detail & Related papers (2024-08-01T07:35:54Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [56.57177181778517]
RadGenome-Chest CT is a large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE.
We leverage the latest powerful universal segmentation and large language models to extend the original datasets.
arXiv Detail & Related papers (2024-04-25T17:11:37Z) - TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation [0.7381551917607596]
TiBiX: Leveraging Temporal information for Bidirectional X-ray and Report Generation.
We propose TiBiX: Leveraging Temporal information for Bidirectional X-ray and Report Generation.
arXiv Detail & Related papers (2024-03-20T07:00:03Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - GAN-GA: A Generative Model based on Genetic Algorithm for Medical Image
Generation [0.0]
Generative models offer a promising solution for addressing medical image shortage problems.
This paper proposes the GAN-GA, a generative model optimized by embedding a genetic algorithm.
The proposed model enhances image fidelity and diversity while preserving distinctive features.
arXiv Detail & Related papers (2023-12-30T20:16:45Z) - C^2M-DoT: Cross-modal consistent multi-view medical report generation
with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT)
C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z) - Dynamic Multi-Domain Knowledge Networks for Chest X-ray Report
Generation [0.5939858158928474]
We propose a Dynamic Multi-Domain Knowledge(DMDK) network for radiology diagnostic report generation.
The DMDK network consists of four modules: Chest Feature Extractor(CFE), Dynamic Knowledge Extractor(DKE), Specific Knowledge Extractor(SKE), and Multi-knowledge Integrator(MKI) module.
We performed extensive experiments on two widely used datasets, IU X-Ray and MIMIC-CXR.
arXiv Detail & Related papers (2023-10-08T11:20:02Z) - IIHT: Medical Report Generation with Image-to-Indicator Hierarchical
Transformer [4.376565880192482]
We propose an image-to-indicator hierarchical transformer (IIHT) framework for medical report generation.
The proposed IIHT method is feasible for radiologists to modify disease indicators in real-world scenarios.
arXiv Detail & Related papers (2023-08-10T15:22:11Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.