Complex Organ Mask Guided Radiology Report Generation
- URL: http://arxiv.org/abs/2311.02329v2
- Date: Fri, 10 Nov 2023 04:24:35 GMT
- Title: Complex Organ Mask Guided Radiology Report Generation
- Authors: Tiancheng Gu, Dongnan Liu, Zhiyuan Li, Weidong Cai
- Abstract summary: We propose the Complex Organ Mask Guided (termed as COMG) report generation model.
We leverage prior knowledge of the disease corresponding to each organ in the fusion process to enhance the disease identification phase.
Results on two public datasets show that COMG achieves a 11.4% and 9.7% improvement in terms of BLEU@4 scores over the SOTA model KiUT.
- Score: 13.96983438709763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of automatic report generation is to generate a clinically accurate
and coherent phrase from a single given X-ray image, which could alleviate the
workload of traditional radiology reporting. However, in a real-world scenario,
radiologists frequently face the challenge of producing extensive reports
derived from numerous medical images, thereby medical report generation from
multi-image perspective is needed. In this paper, we propose the Complex Organ
Mask Guided (termed as COMG) report generation model, which incorporates masks
from multiple organs (e.g., bones, lungs, heart, and mediastinum), to provide
more detailed information and guide the model's attention to these crucial body
regions. Specifically, we leverage prior knowledge of the disease corresponding
to each organ in the fusion process to enhance the disease identification phase
during the report generation process. Additionally, cosine similarity loss is
introduced as target function to ensure the convergence of cross-modal
consistency and facilitate model optimization.Experimental results on two
public datasets show that COMG achieves a 11.4% and 9.7% improvement in terms
of BLEU@4 scores over the SOTA model KiUT on IU-Xray and MIMIC, respectively.
The code is publicly available at https://github.com/GaryGuTC/COMG_model.
Related papers
- RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [56.57177181778517]
RadGenome-Chest CT is a large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE.
We leverage the latest powerful universal segmentation and large language models to extend the original datasets.
arXiv Detail & Related papers (2024-04-25T17:11:37Z) - MAIRA-1: A specialised large multimodal model for radiology report generation [41.69727330319648]
We present a radiology-specific multimodal model for generating radiological reports from chest X-rays (CXRs)
Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders.
Our proposed model (MAIRA-1) leverages a CXR-specific image encoder in conjunction with a fine-tuned large language model based on Vicuna-7B, and text-based data augmentation, to produce reports with state-of-the-art quality.
arXiv Detail & Related papers (2023-11-22T19:45:40Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Self adaptive global-local feature enhancement for radiology report
generation [10.958641951927817]
We propose a novel framework AGFNet to dynamically fuse the global and anatomy region feature to generate multi-grained radiology report.
Firstly, we extract important anatomy region features and global features of input Chest X-ray (CXR)
Then, with the region features and the global features as input, our proposed self-adaptive fusion gate module could dynamically fuse multi-granularity information.
Finally, the captioning generator generates the radiology reports through multi-granularity features.
arXiv Detail & Related papers (2022-11-21T11:50:42Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv Detail & Related papers (2022-03-18T13:43:53Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z) - Show, Describe and Conclude: On Exploiting the Structure Information of
Chest X-Ray Reports [5.6070625920019825]
Chest X-Ray (CXR) images are commonly used for clinical screening and diagnosis.
The complex structures between and within sections of the reports pose a great challenge to the automatic report generation.
We propose a novel framework that exploits the structure information between and within report sections for generating CXR imaging reports.
arXiv Detail & Related papers (2020-04-26T02:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.