Medical Report Generation based on Segment-Enhanced Contrastive
Representation Learning
- URL: http://arxiv.org/abs/2312.15869v1
- Date: Tue, 26 Dec 2023 03:33:48 GMT
- Title: Medical Report Generation based on Segment-Enhanced Contrastive
Representation Learning
- Authors: Ruoqing Zhao, Xi Wang, Hongliang Dai, Pan Gao, Piji Li
- Abstract summary: We propose MSCL (Medical image with Contrastive Learning) to segment organs, abnormalities, bones, etc.
We introduce a supervised contrastive loss that assigns more weight to reports that are semantically similar to the target while training.
Experimental results demonstrate the effectiveness of our proposed model, where we achieve state-of-the-art performance on the IU X-Ray public dataset.
- Score: 39.17345313432545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated radiology report generation has the potential to improve radiology
reporting and alleviate the workload of radiologists. However, the medical
report generation task poses unique challenges due to the limited availability
of medical data and the presence of data bias. To maximize the utility of
available data and reduce data bias, we propose MSCL (Medical image
Segmentation with Contrastive Learning), a framework that utilizes the Segment
Anything Model (SAM) to segment organs, abnormalities, bones, etc., and can pay
more attention to the meaningful ROIs in the image to get better visual
representations. Then we introduce a supervised contrastive loss that assigns
more weight to reports that are semantically similar to the target while
training. The design of this loss function aims to mitigate the impact of data
bias and encourage the model to capture the essential features of a medical
image and generate high-quality reports. Experimental results demonstrate the
effectiveness of our proposed model, where we achieve state-of-the-art
performance on the IU X-Ray public dataset.
Related papers
- Resource-Efficient Medical Report Generation using Large Language Models [3.2627279988912194]
Medical report generation is the task of automatically writing radiology reports for chest X-ray images.
We propose a new framework leveraging vision-enabled Large Language Models (LLM) for the task of medical report generation.
arXiv Detail & Related papers (2024-10-21T05:08:18Z) - TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model [22.305034251561835]
We propose a truthful radiology report generation framework, namely TRRG, based on stage-wise training for cross-modal disease clue injection into large language models.
Our proposed framework achieves state-of-the-art performance in radiology report generation on datasets such as IU-Xray and MIMIC-CXR.
arXiv Detail & Related papers (2024-08-22T05:52:27Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report
Generation [92.73584302508907]
We propose a knowledge graph with Dynamic structure and nodes to facilitate medical report generation with Contrastive Learning.
In detail, the fundamental structure of our graph is pre-constructed from general knowledge.
Each image feature is integrated with its very own updated graph before being fed into the decoder module for report generation.
arXiv Detail & Related papers (2023-03-18T03:53:43Z) - Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance.
Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas.
We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - Cross-modal Memory Networks for Radiology Report Generation [30.13916304931662]
Cross-modal memory networks (CMN) are proposed to enhance the encoder-decoder framework for radiology report generation.
Our model is able to better align information from radiology images and texts so as to help generating more accurate reports in terms of clinical indicators.
arXiv Detail & Related papers (2022-04-28T02:32:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.