MedRG: Medical Report Grounding with Multi-modal Large Language Model
- URL: http://arxiv.org/abs/2404.06798v1
- Date: Wed, 10 Apr 2024 07:41:35 GMT
- Title: MedRG: Medical Report Grounding with Multi-modal Large Language Model
- Authors: Ke Zou, Yang Bai, Zhihao Chen, Yang Zhou, Yidi Chen, Kai Ren, Meng Wang, Xuedong Yuan, Xiaojing Shen, Huazhu Fu,
- Abstract summary: Medical Report Grounding (MedRG) is an end-to-end solution for utilizing a multi-modal Large Language Model to predict key phrase.
The experimental results validate the effectiveness of MedRG, surpassing the performance of the existing state-of-the-art medical phrase grounding methods.
- Score: 42.04042642085121
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical Report Grounding is pivotal in identifying the most relevant regions in medical images based on a given phrase query, a critical aspect in medical image analysis and radiological diagnosis. However, prevailing visual grounding approaches necessitate the manual extraction of key phrases from medical reports, imposing substantial burdens on both system efficiency and physicians. In this paper, we introduce a novel framework, Medical Report Grounding (MedRG), an end-to-end solution for utilizing a multi-modal Large Language Model to predict key phrase by incorporating a unique token, BOX, into the vocabulary to serve as an embedding for unlocking detection capabilities. Subsequently, the vision encoder-decoder jointly decodes the hidden embedding and the input medical image, generating the corresponding grounding box. The experimental results validate the effectiveness of MedRG, surpassing the performance of the existing state-of-the-art medical phrase grounding methods. This study represents a pioneering exploration of the medical report grounding task, marking the first-ever endeavor in this domain.
Related papers
- Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning [9.913879680322042]
The lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models.
We present a diagnosis-guided bootstrapping strategy that exploits both image and label information to construct vision-language datasets.
arXiv Detail & Related papers (2024-04-23T15:27:19Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - Improving Medical Dialogue Generation with Abstract Meaning
Representations [26.97253577302195]
Medical Dialogue Generation serves a critical role in telemedicine by facilitating the dissemination of medical expertise to patients.
Existing studies focus on incorporating textual representations, which have limited their ability to represent the semantics of text.
We introduce the use of Abstract Meaning Representations (AMR) to construct graphical representations that delineate the roles of language constituents and medical entities.
arXiv Detail & Related papers (2023-09-19T13:31:49Z) - Building RadiologyNET: Unsupervised annotation of a large-scale
multimodal medical database [0.4915744683251151]
The usage of machine learning in medical diagnosis and treatment has witnessed significant growth in recent years.
However, the availability of large annotated image datasets remains a major obstacle since the process of annotation is time-consuming and costly.
This paper explores how to automatically annotate a database of medical radiology images with regard to their semantic similarity.
arXiv Detail & Related papers (2023-07-27T13:00:33Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Medical Phrase Grounding with Region-Phrase Context Contrastive
Alignment [35.56193044201645]
Medical phrase grounding aims to locate the most relevant region in a medical image, given a phrase query describing certain medical findings.
In this paper, we propose MedRPG, an end-to-end approach for MPG.
To enable MedRPG to locate nuanced medical findings with better region-phrase correspondences, we further propose Tri-attention Context contrastive alignment (TaCo)
arXiv Detail & Related papers (2023-03-14T03:57:16Z) - MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in
Radiology [40.52487429030841]
We consider enhancing medical visual-language pre-training with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.
First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information.
Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field.
Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis
arXiv Detail & Related papers (2023-01-05T18:55:09Z) - MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware
Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG.
We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation.
Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.