A Medical Semantic-Assisted Transformer for Radiographic Report
Generation
- URL: http://arxiv.org/abs/2208.10358v1
- Date: Mon, 22 Aug 2022 14:38:19 GMT
- Title: A Medical Semantic-Assisted Transformer for Radiographic Report
Generation
- Authors: Zhanyu Wang, Mingkang Tang, Lei Wang, Xiu Li, Luping Zhou
- Abstract summary: We propose a memory-augmented sparse attention block to capture the higher-order interactions between the input fine-grained image features.
We also introduce a novel Medical Concepts Generation Network (MCGN) to predict fine-grained semantic concepts and incorporate them into the report generation process as guidance.
- Score: 39.99216295697047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated radiographic report generation is a challenging cross-domain task
that aims to automatically generate accurate and semantic-coherence reports to
describe medical images. Despite the recent progress in this field, there are
still many challenges at least in the following aspects. First, radiographic
images are very similar to each other, and thus it is difficult to capture the
fine-grained visual differences using CNN as the visual feature extractor like
many existing methods. Further, semantic information has been widely applied to
boost the performance of generation tasks (e.g. image captioning), but existing
methods often fail to provide effective medical semantic features. Toward
solving those problems, in this paper, we propose a memory-augmented sparse
attention block utilizing bilinear pooling to capture the higher-order
interactions between the input fine-grained image features while producing
sparse attention. Moreover, we introduce a novel Medical Concepts Generation
Network (MCGN) to predict fine-grained semantic concepts and incorporate them
into the report generation process as guidance. Our proposed method shows
promising performance on the recently released largest benchmark MIMIC-CXR. It
outperforms multiple state-of-the-art methods in image captioning and medical
report generation.
Related papers
- Medical Report Generation Is A Multi-label Classification Problem [38.64929236412092]
We propose rethinking medical report generation as a multi-label classification problem.
We introduce a novel report generation framework based on BLIP integrated with classified key nodes.
Our experiments demonstrate that leveraging key nodes can achieve state-of-the-art (SOTA) performance, surpassing existing approaches across two benchmark datasets.
arXiv Detail & Related papers (2024-08-30T20:43:35Z) - MedRAT: Unpaired Medical Report Generation via Auxiliary Tasks [11.190146577567548]
We propose a novel model that leverages the available information in two distinct datasets.
Our model, named MedRAT, surpasses previous state-of-the-art methods.
arXiv Detail & Related papers (2024-07-04T13:31:47Z) - MedCycle: Unpaired Medical Report Generation via Cycle-Consistency [11.190146577567548]
We introduce an innovative approach that eliminates the need for consistent labeling schemas.
This approach is based on cycle-consistent mapping functions that transform image embeddings into report embeddings.
It outperforms state-of-the-art results in unpaired chest X-ray report generation, demonstrating improvements in both language and clinical metrics.
arXiv Detail & Related papers (2024-03-20T09:40:11Z) - C^2M-DoT: Cross-modal consistent multi-view medical report generation
with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT)
C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Variational Topic Inference for Chest X-Ray Report Generation [102.04931207504173]
Report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice.
Recent work has shown that deep learning models can successfully caption natural images.
We propose variational topic inference for automatic report generation.
arXiv Detail & Related papers (2021-07-15T13:34:38Z) - Longer Version for "Deep Context-Encoding Network for Retinal Image
Captioning" [21.558908631487405]
We propose a new context-driven encoding network to automatically generate medical reports for retinal images.
The proposed model is mainly composed of a multi-modal input encoder and a fused-feature decoder.
arXiv Detail & Related papers (2021-05-30T13:37:03Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.