MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis
- URL: http://arxiv.org/abs/2407.04106v1
- Date: Thu, 4 Jul 2024 18:21:10 GMT
- Title: MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis
- Authors: Asma Alkhaldi, Raneem Alnajim, Layan Alabdullatef, Rawan Alyahya, Jun Chen, Deyao Zhu, Ahmed Alsinan, Mohamed Elhoseiny,
- Abstract summary: MiniGPT-Med is a vision-language model derived from large-scale language models and tailored for medical applications.
It is capable of performing tasks such as medical report generation, visual question answering (VQA), and disease identification within medical imagery.
It achieves state-of-the-art performance on medical report generation, higher than the previous best model by 19% accuracy.
- Score: 28.421857904824627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in artificial intelligence (AI) have precipitated significant breakthroughs in healthcare, particularly in refining diagnostic procedures. However, previous studies have often been constrained to limited functionalities. This study introduces MiniGPT-Med, a vision-language model derived from large-scale language models and tailored for medical applications. MiniGPT-Med demonstrates remarkable versatility across various imaging modalities, including X-rays, CT scans, and MRIs, enhancing its utility. The model is capable of performing tasks such as medical report generation, visual question answering (VQA), and disease identification within medical imagery. Its integrated processing of both image and textual clinical data markedly improves diagnostic accuracy. Our empirical assessments confirm MiniGPT-Med's superior performance in disease grounding, medical report generation, and VQA benchmarks, representing a significant step towards reducing the gap in assisting radiology practice. Furthermore, it achieves state-of-the-art performance on medical report generation, higher than the previous best model by 19\% accuracy. MiniGPT-Med promises to become a general interface for radiology diagnoses, enhancing diagnostic efficiency across a wide range of medical imaging applications.
Related papers
- MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot [47.77948063906033]
Retrieval-augmented generation (RAG) is a well-suited technique for retrieving privacy-sensitive Electronic Health Records.
This paper proposes MedRAG, a RAG model enhanced by knowledge graph (KG)-elicited reasoning for the medical domain.
Tests show MedRAG provides more specific diagnostic insights and outperforms state-of-the-art models in reducing misdiagnosis rates.
arXiv Detail & Related papers (2025-02-06T12:27:35Z) - Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding [17.783231335173486]
We propose a fine-grained vision-language model (fVLM) for anatomy-level CT image interpretation.
Fine-grained alignment, however, faces considerable false-negative challenges.
We curated the largest CT dataset to date, comprising imaging and report data from 69,086 patients.
arXiv Detail & Related papers (2025-01-24T14:50:48Z) - GIT-CXR: End-to-End Transformer for Chest X-Ray Report Generation [2.8900715468305767]
We have designed and evaluated an end-to-end transformer-based method to generate accurate and factually complete radiology reports for X-ray images.
The experiments have been conducted using the MIMIC-CXR-JPG database, the largest available chest X-ray dataset.
arXiv Detail & Related papers (2025-01-05T16:45:49Z) - 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - The Era of Foundation Models in Medical Imaging is Approaching : A Scoping Review of the Clinical Value of Large-Scale Generative AI Applications in Radiology [0.0]
Social problems stemming from the shortage of radiologists are intensifying, and artificial intelligence is being highlighted as a potential solution.
Recently emerging large-scale generative AI has expanded from large language models (LLMs) to multi-modal models.
This scoping review systematically organizes existing literature on the clinical value of large-scale generative AI applications.
arXiv Detail & Related papers (2024-09-03T00:48:50Z) - OrthoDoc: Multimodal Large Language Model for Assisting Diagnosis in Computed Tomography [2.004909615444003]
Multimodal large language models (MLLMs) have achieved significant success in the general field of image processing.
We developed OrthoDoc, a MLLM designed for Computed Tomography (CT) diagnostics.
In extensive experiments, OrthoDoc outperforms commercial models led by GPT-4, demonstrating superior diagnostic capabilities and accuracy.
arXiv Detail & Related papers (2024-08-30T13:31:32Z) - MGH Radiology Llama: A Llama 3 70B Model for Radiology [50.42811030970618]
This paper presents an advanced radiology-focused large language model: MGH Radiology Llama.
It is developed using the Llama 3 70B model, building upon previous domain-specific models like Radiology-GPT and Radiology-Llama2.
Our evaluation, incorporating both traditional metrics and a GPT-4-based assessment, highlights the enhanced performance of this work over general-purpose LLMs.
arXiv Detail & Related papers (2024-08-13T01:30:03Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for
Multimodal Medical Diagnosis [59.35504779947686]
GPT-4V is OpenAI's newest model for multimodal medical diagnosis.
Our evaluation encompasses 17 human body systems.
GPT-4V demonstrates proficiency in distinguishing between medical image modalities and anatomy.
It faces significant challenges in disease diagnosis and generating comprehensive reports.
arXiv Detail & Related papers (2023-10-15T18:32:27Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.