Related papers: MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models

MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models

URL: http://arxiv.org/abs/2506.07400v2
Date: Wed, 11 Jun 2025 04:43:27 GMT
Title: MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models
Authors: Philip R. Liu, Sparsh Bansal, Jimmy Dinh, Aditya Pawar, Ramani Satishkumar, Shail Desai, Neeraj Gupta, Xin Wang, Shu Hu,
Abstract summary: Integrating glaucoma detection with large language models (LLMs) presents an automated strategy to mitigate ophthalmologist shortages.<n>Applying general LLMs to medical imaging remains challenging due to hallucinations, limited interpretability, and insufficient domain-specific medical knowledge.<n>We propose MedChat, a multi-agent diagnostic framework and platform that combines specialized vision models with multiple role-specific LLM agents.
Score: 9.411749481805355
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The integration of deep learning-based glaucoma detection with large language models (LLMs) presents an automated strategy to mitigate ophthalmologist shortages and improve clinical reporting efficiency. However, applying general LLMs to medical imaging remains challenging due to hallucinations, limited interpretability, and insufficient domain-specific medical knowledge, which can potentially reduce clinical accuracy. Although recent approaches combining imaging models with LLM reasoning have improved reporting, they typically rely on a single generalist agent, restricting their capacity to emulate the diverse and complex reasoning found in multidisciplinary medical teams. To address these limitations, we propose MedChat, a multi-agent diagnostic framework and platform that combines specialized vision models with multiple role-specific LLM agents, all coordinated by a director agent. This design enhances reliability, reduces hallucination risk, and enables interactive diagnostic reporting through an interface tailored for clinical review and educational use. Code available at https://github.com/Purdue-M2/MedChat.

Related papers

Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z)
MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration [57.98393950821579]
We introduce the Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis (MAM)<n>Inspired by our empirical findings, MAM decomposes the medical diagnostic process into specialized roles: a General Practitioner, Specialist Team, Radiologist, Medical Assistant, and Director.<n>This modular and collaborative framework enables efficient knowledge updates and leverages existing medical LLMs and knowledge bases.
arXiv Detail & Related papers (2025-06-24T17:52:43Z)
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning [63.63542462400175]
We propose MMedAgent-RL, a reinforcement learning-based multi-agent framework that enables dynamic, optimized collaboration among medical agents.<n> Specifically, we train two GP agents based on Qwen2.5-VL via RL: the triage doctor learns to assign patients to appropriate specialties, while the attending physician integrates the judgments from multi-specialists.<n>Experiments on five medical VQA benchmarks demonstrate that MMedAgent-RL not only outperforms both open-source and proprietary Med-LVLMs, but also exhibits human-like reasoning patterns.
arXiv Detail & Related papers (2025-05-31T13:22:55Z)
DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue [14.95390953068765]
Large language models (LLMs) have demonstrated excellent capabilities in the field of biomedical question answering, but their application in real-world clinical consultations still faces core challenges.<n>We propose Ours, a reinforcement learning (RL)-based multi-agent collaborative framework that models medical consultations as a dynamic decision-making process under uncertainty.<n>Our approach shows immense practical value by reducing misdiagnosis risks in time-pressured settings, freeing clinicians for complex cases, and pioneering a strategy to optimize medical resource allocation and alleviate workforce shortages.
arXiv Detail & Related papers (2025-05-26T07:48:14Z)
A Multimodal Multi-Agent Framework for Radiology Report Generation [2.1477122604204433]
Radiology report generation (RRG) aims to automatically produce diagnostic reports from medical images.<n>We propose a multimodal multi-agent framework for RRG that aligns with the stepwise clinical reasoning workflow.
arXiv Detail & Related papers (2025-05-14T20:28:04Z)
RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models [17.579521693647383]
We introduce textitRetinalGPT, a multimodal conversational assistant for clinically preferred quantitative analysis of retinal images.<n>In particular, RetinalGPT outperforms MLLM in the generic domain by a large margin in the diagnosis of retinal diseases.
arXiv Detail & Related papers (2025-03-06T00:19:54Z)
Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
Knowledge-Augmented Multimodal Clinical Rationale Generation for Disease Diagnosis with Small Language Models [14.136585695164426]
Small language models (SLMs) are efficient but lack advanced reasoning for integrating multimodal medical data.<n>We propose ClinRaGen, enhancing SLMs by leveraging LLM-derived reasoning ability via rationale distillation and domain knowledge injection.<n> Experiments on real-world medical datasets show that ClinRaGen achieves state-of-the-art performance in disease diagnosis and rationale generation.
arXiv Detail & Related papers (2024-11-12T07:34:56Z)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z)
OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue [7.140551103766788]
We introduce visual ability into the large language model to complete the ophthalmic large language and vision assistant (OphGLM) Our experimental results demonstrate that the OphGLM model performs exceptionally well, and it has the potential to revolutionize clinical applications in ophthalmology.
arXiv Detail & Related papers (2023-06-21T11:09:48Z)
ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs [48.11532667875847]
ChatCAD+ is a tool to generate high-quality medical reports and provide reliable medical advice. The Reliable Report Generation module is capable of interpreting medical images and generate high-quality medical reports. The Reliable Interaction module leverages up-to-date information from reputable medical websites to provide reliable medical advice.
arXiv Detail & Related papers (2023-05-25T12:03:31Z)
ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models [53.73049253535025]
Large language models (LLMs) have recently demonstrated their potential in clinical applications. This paper presents a method for integrating LLMs into medical-image CAD networks. The goal is to merge the strengths of LLMs' medical domain knowledge and logical reasoning with the vision understanding capability of existing medical-image CAD models.
arXiv Detail & Related papers (2023-02-14T18:54:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.