Cross-modal Memory Networks for Radiology Report Generation
- URL: http://arxiv.org/abs/2204.13258v1
- Date: Thu, 28 Apr 2022 02:32:53 GMT
- Title: Cross-modal Memory Networks for Radiology Report Generation
- Authors: Zhihong Chen, Yaling Shen, Yan Song, Xiang Wan
- Abstract summary: Cross-modal memory networks (CMN) are proposed to enhance the encoder-decoder framework for radiology report generation.
Our model is able to better align information from radiology images and texts so as to help generating more accurate reports in terms of clinical indicators.
- Score: 30.13916304931662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical imaging plays a significant role in clinical practice of medical
diagnosis, where the text reports of the images are essential in understanding
them and facilitating later treatments. By generating the reports
automatically, it is beneficial to help lighten the burden of radiologists and
significantly promote clinical automation, which already attracts much
attention in applying artificial intelligence to medical domain. Previous
studies mainly follow the encoder-decoder paradigm and focus on the aspect of
text generation, with few studies considering the importance of cross-modal
mappings and explicitly exploit such mappings to facilitate radiology report
generation. In this paper, we propose a cross-modal memory networks (CMN) to
enhance the encoder-decoder framework for radiology report generation, where a
shared memory is designed to record the alignment between images and texts so
as to facilitate the interaction and generation across modalities. Experimental
results illustrate the effectiveness of our proposed model, where
state-of-the-art performance is achieved on two widely used benchmark datasets,
i.e., IU X-Ray and MIMIC-CXR. Further analyses also prove that our model is
able to better align information from radiology images and texts so as to help
generating more accurate reports in terms of clinical indicators.
Related papers
- Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - KiUT: Knowledge-injected U-Transformer for Radiology Report Generation [10.139767157037829]
Radiology report generation aims to automatically generate a clinically accurate and coherent paragraph from the X-ray image.
We propose a Knowledge-injected U-Transformer (KiUT) to learn multi-level visual representation and adaptively distill the information.
arXiv Detail & Related papers (2023-06-20T07:27:28Z) - Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report
Generation [92.73584302508907]
We propose a knowledge graph with Dynamic structure and nodes to facilitate medical report generation with Contrastive Learning.
In detail, the fundamental structure of our graph is pre-constructed from general knowledge.
Each image feature is integrated with its very own updated graph before being fed into the decoder module for report generation.
arXiv Detail & Related papers (2023-03-18T03:53:43Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Radiology Report Generation with a Learned Knowledge Base and
Multi-modal Alignment [27.111857943935725]
We present an automatic, multi-modal approach for report generation from chest x-ray.
Our approach features two distinct modules: (i) Learned knowledge base and (ii) Multi-modal alignment.
With the aid of both modules, our approach clearly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-12-30T10:43:56Z) - Generating Radiology Reports via Memory-driven Transformer [38.30011851429407]
We propose to generate radiology reports with memory-driven Transformer.
Experimental results on two prevailing radiology report datasets, IU X-Ray and MIMIC-CXR.
arXiv Detail & Related papers (2020-10-30T04:08:03Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.