R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation
- URL: http://arxiv.org/abs/2408.09743v1
- Date: Mon, 19 Aug 2024 07:15:11 GMT
- Title: R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation
- Authors: Xiao Wang, Yuehang Li, Fuling Wang, Shiao Wang, Chuanfu Li, Bo Jiang,
- Abstract summary: This paper proposes a novel context-guided efficient X-ray medical report generation framework.
Specifically, we introduce the Mamba as the vision backbone with linear complexity, and the performance obtained is comparable to that of the strong Transformer model.
- Score: 7.4871243017824165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by the tremendous success of Large Language Models (LLMs), existing X-ray medical report generation methods attempt to leverage large models to achieve better performance. They usually adopt a Transformer to extract the visual features of a given X-ray image, and then, feed them into the LLM for text generation. How to extract more effective information for the LLMs to help them improve final results is an urgent problem that needs to be solved. Additionally, the use of visual Transformer models also brings high computational complexity. To address these issues, this paper proposes a novel context-guided efficient X-ray medical report generation framework. Specifically, we introduce the Mamba as the vision backbone with linear complexity, and the performance obtained is comparable to that of the strong Transformer model. More importantly, we perform context retrieval from the training set for samples within each mini-batch during the training phase, utilizing both positively and negatively related samples to enhance feature representation and discriminative learning. Subsequently, we feed the vision tokens, context information, and prompt statements to invoke the LLM for generating high-quality medical reports. Extensive experiments on three X-ray report generation datasets (i.e., IU-Xray, MIMIC-CXR, CheXpert Plus) fully validated the effectiveness of our proposed model. The source code of this work will be released on \url{https://github.com/Event-AHU/Medical_Image_Analysis}.
Related papers
- Resource-Efficient Medical Report Generation using Large Language Models [3.2627279988912194]
Medical report generation is the task of automatically writing radiology reports for chest X-ray images.
We propose a new framework leveraging vision-enabled Large Language Models (LLM) for the task of medical report generation.
arXiv Detail & Related papers (2024-10-21T05:08:18Z) - CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset [14.911363203907008]
X-ray image-based medical report generation can significantly reduce diagnostic burdens and patient wait times.
We conduct a comprehensive benchmarking of existing mainstream X-ray report generation models and large language models (LLMs) on the CheXpert Plus dataset.
We propose a large model for the X-ray image report generation using a multi-stage pre-training strategy, including self-supervised autoregressive generation and Xray-report contrastive learning.
arXiv Detail & Related papers (2024-10-01T04:07:01Z) - Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis [53.809054774037214]
This paper proposes leveraging vision-language pretraining on bone X-rays paired with French reports.
It is the first study to integrate French reports to shape the embedding space devoted to bone X-Rays representations.
arXiv Detail & Related papers (2024-05-14T19:53:20Z) - SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models [9.390882250428305]
Radiology Report Generation (R2Gen) demonstrates how Multi-modal Large Language Models (MLLMs) can automate the creation of accurate and coherent radiological reports.
Existing methods often hallucinate details in text-based reports that don't accurately reflect the image content.
We introduce a novel strategy, which improves the R2Gen task by integrating a self-refining mechanism into the MLLM framework.
arXiv Detail & Related papers (2024-04-27T13:46:23Z) - MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis [1.2903829793534272]
Chest X-ray images are commonly used for predicting acute and chronic cardiopulmonary conditions.
Efforts to integrate them with structured clinical data face challenges due to incomplete electronic health records.
This paper introduces MedPromptX, the first model to integrate multimodal large language models (MLLMs), few-shot prompting (FP) and visual grounding (VG)
Results demonstrate the SOTA performance of MedPromptX, achieving an 11% improvement in F1-score compared to the baselines.
arXiv Detail & Related papers (2024-03-22T19:19:51Z) - MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation [28.497591315598402]
Multimodal Large Language Models (MLLMs) have shown success in various general image processing tasks.
This study investigates the potential of MLLMs in improving the understanding and generation of Chest X-Rays (CXRs)
arXiv Detail & Related papers (2023-12-04T06:40:12Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Competence-based Multimodal Curriculum Learning for Medical Report
Generation [98.10763792453925]
We propose a Competence-based Multimodal Curriculum Learning framework ( CMCL) to alleviate the data bias and make best use of available data.
Specifically, CMCL simulates the learning process of radiologists and optimize the model in a step by step manner.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that CMCL can be incorporated into existing models to improve their performance.
arXiv Detail & Related papers (2022-06-24T08:16:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.