Related papers: MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

URL: http://arxiv.org/abs/2407.02483v1
Date: Tue, 2 Jul 2024 17:58:23 GMT
Title: MMedAgent: Learning to Use Medical Tools with Multi-modal Agent
Authors: Binxu Li, Tiankai Yan, Yuanting Pan, Zhe Xu, Jie Luo, Ruiyang Ji, Shilong Liu, Haoyu Dong, Zihao Lin, Yixin Wang,
Abstract summary: This paper introduces the first agent explicitly designed for the medical field, named textbfMulti-modal textbfMedical textbfAgent (MMedAgent) We curate an instruction-tuning dataset comprising six medical tools solving seven tasks, enabling the agent to choose the most suitable tools for a given task. MMedAgent achieves superior performance across a variety of medical tasks compared to state-of-the-art open-source methods and even the closed-source model, GPT-4o.
Score: 26.315786330786676
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To bridge this gap, this paper introduces the first agent explicitly designed for the medical field, named \textbf{M}ulti-modal \textbf{Med}ical \textbf{Agent} (MMedAgent). We curate an instruction-tuning dataset comprising six medical tools solving seven tasks, enabling the agent to choose the most suitable tools for a given task. Comprehensive experiments demonstrate that MMedAgent achieves superior performance across a variety of medical tasks compared to state-of-the-art open-source methods and even the closed-source model, GPT-4o. Furthermore, MMedAgent exhibits efficiency in updating and integrating new medical tools.

Related papers

MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration [57.98393950821579]
We introduce the Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis (MAM)<n>Inspired by our empirical findings, MAM decomposes the medical diagnostic process into specialized roles: a General Practitioner, Specialist Team, Radiologist, Medical Assistant, and Director.<n>This modular and collaborative framework enables efficient knowledge updates and leverages existing medical LLMs and knowledge bases.
arXiv Detail & Related papers (2025-06-24T17:52:43Z)
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning [63.63542462400175]
We propose MMedAgent-RL, a reinforcement learning-based multi-agent framework that enables dynamic, optimized collaboration among medical agents.<n> Specifically, we train two GP agents based on Qwen2.5-VL via RL: the triage doctor learns to assign patients to appropriate specialties, while the attending physician integrates the judgments from multi-specialists.<n>Experiments on five medical VQA benchmarks demonstrate that MMedAgent-RL not only outperforms both open-source and proprietary Med-LVLMs, but also exhibits human-like reasoning patterns.
arXiv Detail & Related papers (2025-05-31T13:22:55Z)
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks [17.567786780266353]
We introduce MedAgentBoard, a comprehensive benchmark for the systematic evaluation of multi-agent collaboration, single-LLM, and conventional approaches.<n> MedAgentBoard encompasses four diverse medical task categories: medical (visual) question answering, lay summary generation, structured Electronic Health Record (EHR) predictive modeling, and clinical workflow automation.<n>Our extensive experiments reveal a nuanced landscape: while multi-agent collaboration demonstrates benefits in specific scenarios, it does not consistently outperform advanced single LLMs.
arXiv Detail & Related papers (2025-05-18T11:28:17Z)
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding [40.52017994491893]
MDocAgent is a novel RAG and multi-agent framework that leverages both text and image. Our system employs five specialized agents: a general agent, a critical agent, a text agent, an image agent and a summarizing agent. Preliminary experiments on five benchmarks demonstrate the effectiveness of our MDocAgent, achieve an average improvement of 12.1%.
arXiv Detail & Related papers (2025-03-18T06:57:21Z)
M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging [54.40890979694209]
We present M3Builder, a novel multi-agent system designed to automate machine learning (ML) in medical imaging. At its core, M3Builder employs four specialized agents that collaborate to tackle complex, multi-step medical ML. Compared to existing ML agentic designs, M3Builder shows superior performance on completing ML tasks in medical imaging.
arXiv Detail & Related papers (2025-02-27T17:29:46Z)
Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding [9.144030136201476]
Multimodal large language models (MLLMs) inherit the superior text understanding capabilities of LLMs and extend these capabilities to multimodal scenarios. These models achieve excellent results in the general domain of multimodal tasks. However, in the medical domain, the substantial training costs and the requirement for extensive medical data pose challenges to the development of medical MLLMs.
arXiv Detail & Related papers (2024-10-31T11:07:26Z)
FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models [54.09244105445476]
This study introduces a novel knowledge injection approach, FedKIM, to scale the medical foundation model within a federated learning framework. FedKIM leverages lightweight local models to extract healthcare knowledge from private data and integrates this knowledge into a centralized foundation model. Our experiments across twelve tasks in seven modalities demonstrate the effectiveness of FedKIM in various settings.
arXiv Detail & Related papers (2024-08-17T15:42:29Z)
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms [55.77492625524141]
EvoAgent is a generic method to automatically extend expert agents to multi-agent systems via the evolutionary algorithm. We show that EvoAgent can automatically generate multiple expert agents and significantly enhance the task-solving capabilities of LLM-based agents.
arXiv Detail & Related papers (2024-06-20T11:49:23Z)
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [116.97648507802926]
Large language models (LLMs) are considered a promising foundation to build such agents. We take the first step towards building generally-capable LLM-based agents with self-evolution ability. We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration.
arXiv Detail & Related papers (2024-06-06T15:15:41Z)
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents [19.721008909326024]
Large language models (LLMs) have sparked a new wave of technological revolution in medical artificial intelligence (AI) We introduce a simulacrum of hospital called Agent Hospital that simulates the entire process of treating illness. Within the simulacrum, doctor agents are able to evolve by treating a large number of patient agents without the need to label training data manually.
arXiv Detail & Related papers (2024-05-05T14:53:51Z)
MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making [45.74980058831342]
We introduce a novel multi-agent framework, named Medical Decision-making Agents (MDAgents) The assigned solo or group collaboration structure is tailored to the medical task at hand, emulating real-world medical decision-making processes. MDAgents achieved the best performance in seven out of ten benchmarks on tasks requiring an understanding of medical knowledge.
arXiv Detail & Related papers (2024-04-22T06:30:05Z)
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models [17.643421997037514]
We propose a novel framework that tackles both discriminative and generative multimodal medical tasks. The learning of Med-MoE consists of three steps: multimodal medical alignment, instruction tuning and routing, and domain-specific MoE tuning. Our model can achieve performance superior to or on par with state-of-the-art baselines.
arXiv Detail & Related papers (2024-04-16T02:35:17Z)
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities. When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z)
EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records [47.5632532642591]
Large language models (LLMs) have demonstrated exceptional capabilities in planning and tool utilization. We propose EHRAgent, an LLM agent empowered with a code interface, to autonomously generate and execute code for multi-tabular reasoning.
arXiv Detail & Related papers (2024-01-13T18:09:05Z)
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning [35.804520192679874]
Large language models (LLMs) encounter significant barriers in medicine and healthcare. We propose MedAgents, a novel multi-disciplinary collaboration framework for the medical domain. Our work focuses on the zero-shot setting, which is applicable in real-world scenarios.
arXiv Detail & Related papers (2023-11-16T11:47:58Z)
Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining [121.89793208683625]
Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks. We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
arXiv Detail & Related papers (2023-04-26T01:26:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.