Improving Medical Report Generation with Adapter Tuning and Knowledge
Enhancement in Vision-Language Foundation Models
- URL: http://arxiv.org/abs/2312.03970v1
- Date: Thu, 7 Dec 2023 01:01:45 GMT
- Title: Improving Medical Report Generation with Adapter Tuning and Knowledge
Enhancement in Vision-Language Foundation Models
- Authors: Shibin Wu, Bang Yang, Zhiyu Ye, Haoqian Wang, Hairong Zheng, Tong
Zhang
- Abstract summary: This study builds upon the state-of-the-art vision-language pre-training and fine-tuning approach, BLIP-2, to customize general large-scale foundation models.
Validation on the dataset of ImageCLEFmedical 2023 demonstrates our model's prowess, achieving the best-averaged results against several state-of-the-art methods.
- Score: 26.146579369491718
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Medical report generation demands automatic creation of coherent and precise
descriptions for medical images. However, the scarcity of labelled medical
image-report pairs poses formidable challenges in developing large-scale neural
networks capable of harnessing the potential of artificial intelligence,
exemplified by large language models. This study builds upon the
state-of-the-art vision-language pre-training and fine-tuning approach, BLIP-2,
to customize general large-scale foundation models. Integrating adapter tuning
and a medical knowledge enhancement loss, our model significantly improves
accuracy and coherence. Validation on the dataset of ImageCLEFmedical 2023
demonstrates our model's prowess, achieving the best-averaged results against
several state-of-the-art methods. Significant improvements in ROUGE and CIDEr
underscore our method's efficacy, highlighting promising outcomes for the rapid
medical-domain adaptation of the vision-language foundation models in
addressing challenges posed by data scarcity.
Related papers
- LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model [55.80651780294357]
State-of-the-art medical multi-modal large language models (med-MLLM) leverage instruction-following data in pre-training.
LoGra-Med is a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions.
Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data.
arXiv Detail & Related papers (2024-10-03T15:52:03Z) - TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model [22.305034251561835]
We propose a truthful radiology report generation framework, namely TRRG, based on stage-wise training for cross-modal disease clue injection into large language models.
Our proposed framework achieves state-of-the-art performance in radiology report generation on datasets such as IU-Xray and MIMIC-CXR.
arXiv Detail & Related papers (2024-08-22T05:52:27Z) - MRC-based Nested Medical NER with Co-prediction and Adaptive Pre-training [0.38498367961730184]
We propose a medical NER model based on Machine Reading (MRC), which uses a task-adaptive pre-training strategy to improve the model's capability in the medical field.
Our proposed model outperforms the compared state-of-the-art (SOTA) models.
arXiv Detail & Related papers (2024-03-23T11:14:02Z) - Enhancing and Adapting in the Clinic: Source-free Unsupervised Domain
Adaptation for Medical Image Enhancement [34.11633495477596]
We propose an algorithm for source-free unsupervised domain adaptive medical image enhancement (SAME)
A structure-preserving enhancement network is first constructed to learn a robust source model from synthesized training data.
A pseudo-label picker is developed to boost the knowledge distillation of enhancement tasks.
arXiv Detail & Related papers (2023-12-03T10:01:59Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Adapting Pretrained Vision-Language Foundational Models to Medical
Imaging Domains [3.8137985834223502]
Building generative models for medical images that faithfully depict clinical context may help alleviate the paucity of healthcare datasets.
We explore the sub-components of the Stable Diffusion pipeline to fine-tune the model to generate medical images.
Our best-performing model improves upon the stable diffusion baseline and can be conditioned to insert a realistic-looking abnormality on a synthetic radiology image.
arXiv Detail & Related papers (2022-10-09T01:43:08Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.