CephGPT-4: An Interactive Multimodal Cephalometric Measurement and
Diagnostic System with Visual Large Language Model
- URL: http://arxiv.org/abs/2307.07518v1
- Date: Sat, 1 Jul 2023 15:41:12 GMT
- Title: CephGPT-4: An Interactive Multimodal Cephalometric Measurement and
Diagnostic System with Visual Large Language Model
- Authors: Lei Ma, Jincong Han, Zhaoxin Wang, Dian Zhang
- Abstract summary: The CephGPT-4 model exhibits excellent performance and has the potential to revolutionize orthodontic measurement and diagnostic applications.
These innovations hold revolutionary application potential in the field of orthodontics.
- Score: 4.64641334287597
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large-scale multimodal language models (LMMs) have achieved remarkable
success in general domains. However, the exploration of diagnostic language
models based on multimodal cephalometric medical data remains limited. In this
paper, we propose a novel multimodal cephalometric analysis and diagnostic
dialogue model. Firstly, a multimodal orthodontic medical dataset is
constructed, comprising cephalometric images and doctor-patient dialogue data,
with automatic analysis of cephalometric landmarks using U-net and generation
of diagnostic reports. Then, the cephalometric dataset and generated diagnostic
reports are separately fine-tuned on Minigpt-4 and VisualGLM. Results
demonstrate that the CephGPT-4 model exhibits excellent performance and has the
potential to revolutionize orthodontic measurement and diagnostic applications.
These innovations hold revolutionary application potential in the field of
orthodontics.
Related papers
- MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools.
Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses.
We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z) - 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - EEG-Language Modeling for Pathology Detection [0.0]
This study pioneers EEG-language models trained on clinical reports and 15000 EEGs.
Our results indicate that models learn richer representations from being exposed to a variety of report segments.
representations of EEG-language models can significantly improve pathology detection compared to those of EEG-only models.
arXiv Detail & Related papers (2024-09-02T10:03:03Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for
Multimodal Medical Diagnosis [59.35504779947686]
GPT-4V is OpenAI's newest model for multimodal medical diagnosis.
Our evaluation encompasses 17 human body systems.
GPT-4V demonstrates proficiency in distinguishing between medical image modalities and anatomy.
It faces significant challenges in disease diagnosis and generating comprehensive reports.
arXiv Detail & Related papers (2023-10-15T18:32:27Z) - Multi-modal Graph Neural Network for Early Diagnosis of Alzheimer's
Disease from sMRI and PET Scans [11.420077093805382]
We propose to use graph neural networks (GNN) that are designed to deal with problems in non-Euclidean domains.
In this study, we demonstrate how brain networks can be created from sMRI or PET images.
We then present a multi-modal GNN framework where each modality has its own branch of GNN and a technique is proposed to combine the multi-modal data.
arXiv Detail & Related papers (2023-07-31T02:04:05Z) - OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant
based on Instructions and Dialogue [7.140551103766788]
We introduce visual ability into the large language model to complete the ophthalmic large language and vision assistant (OphGLM)
Our experimental results demonstrate that the OphGLM model performs exceptionally well, and it has the potential to revolutionize clinical applications in ophthalmology.
arXiv Detail & Related papers (2023-06-21T11:09:48Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - MMLN: Leveraging Domain Knowledge for Multimodal Diagnosis [10.133715767542386]
We propose a knowledge-driven and data-driven framework for lung disease diagnosis.
We formulate diagnosis rules according to authoritative clinical medicine guidelines and learn the weights of rules from text data.
A multimodal fusion consisting of text and image data is designed to infer the marginal probability of lung disease.
arXiv Detail & Related papers (2022-02-09T04:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.