MMedAgent: Learning to Use Medical Tools with Multi-modal Agent
- URL: http://arxiv.org/abs/2407.02483v2
- Date: Sat, 05 Oct 2024 06:36:33 GMT
- Title: MMedAgent: Learning to Use Medical Tools with Multi-modal Agent
- Authors: Binxu Li, Tiankai Yan, Yuanting Pan, Jie Luo, Ruiyang Ji, Jiayuan Ding, Zhe Xu, Shilong Liu, Haoyu Dong, Zihao Lin, Yixin Wang,
- Abstract summary: This paper introduces the first agent explicitly designed for the medical field, named textbfMulti-modal textbfMedical textbfAgent (MMedAgent)
Comprehensive experiments demonstrate that MMedAgent achieves superior performance across a variety of medical tasks compared to state-of-the-art open-source methods and even the closed-source model, GPT-4o.
- Score: 27.314055140281432
- License:
- Abstract: Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To bridge this gap, this paper introduces the first agent explicitly designed for the medical field, named \textbf{M}ulti-modal \textbf{Med}ical \textbf{Agent} (MMedAgent). We curate an instruction-tuning dataset comprising six medical tools solving seven tasks across five modalities, enabling the agent to choose the most suitable tools for a given task. Comprehensive experiments demonstrate that MMedAgent achieves superior performance across a variety of medical tasks compared to state-of-the-art open-source methods and even the closed-source model, GPT-4o. Furthermore, MMedAgent exhibits efficiency in updating and integrating new medical tools. Codes and models are all available.
Related papers
- MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents [20.96732566767587]
Recent large language models (LLMs) have demonstrated significant advancements, particularly in their ability to serve as agents.
We introduce MedAgentBench, a broad evaluation suite designed to assess the agent capabilities of large language models within medical records contexts.
The environment uses the standard APIs and communication infrastructure used in modern EMR systems, so it can be easily migrated into live EMR systems.
arXiv Detail & Related papers (2025-01-24T17:21:01Z) - FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models [54.09244105445476]
This study introduces a novel knowledge injection approach, FedKIM, to scale the medical foundation model within a federated learning framework.
FedKIM leverages lightweight local models to extract healthcare knowledge from private data and integrates this knowledge into a centralized foundation model.
Our experiments across twelve tasks in seven modalities demonstrate the effectiveness of FedKIM in various settings.
arXiv Detail & Related papers (2024-08-17T15:42:29Z) - EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms [55.77492625524141]
EvoAgent is a generic method to automatically extend expert agents to multi-agent systems via the evolutionary algorithm.
We show that EvoAgent can automatically generate multiple expert agents and significantly enhance the task-solving capabilities of LLM-based agents.
arXiv Detail & Related papers (2024-06-20T11:49:23Z) - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [116.97648507802926]
Large language models (LLMs) are considered a promising foundation to build such agents.
We take the first step towards building generally-capable LLM-based agents with self-evolution ability.
We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration.
arXiv Detail & Related papers (2024-06-06T15:15:41Z) - Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents [19.721008909326024]
Large language models (LLMs) have sparked a new wave of technological revolution in medical artificial intelligence (AI)
We introduce a simulacrum of hospital called Agent Hospital that simulates the entire process of treating illness.
Within the simulacrum, doctor agents are able to evolve by treating a large number of patient agents without the need to label training data manually.
arXiv Detail & Related papers (2024-05-05T14:53:51Z) - MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making [45.74980058831342]
We introduce a novel multi-agent framework, named Medical Decision-making Agents (MDAgents)
The assigned solo or group collaboration structure is tailored to the medical task at hand, emulating real-world medical decision-making processes.
MDAgents achieved the best performance in seven out of ten benchmarks on tasks requiring an understanding of medical knowledge.
arXiv Detail & Related papers (2024-04-22T06:30:05Z) - Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models [17.643421997037514]
We propose a novel framework that tackles both discriminative and generative multimodal medical tasks.
The learning of Med-MoE consists of three steps: multimodal medical alignment, instruction tuning and routing, and domain-specific MoE tuning.
Our model can achieve performance superior to or on par with state-of-the-art baselines.
arXiv Detail & Related papers (2024-04-16T02:35:17Z) - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities.
When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z) - EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records [47.5632532642591]
Large language models (LLMs) have demonstrated exceptional capabilities in planning and tool utilization.
We propose EHRAgent, an LLM agent empowered with a code interface, to autonomously generate and execute code for multi-tabular reasoning.
arXiv Detail & Related papers (2024-01-13T18:09:05Z) - Towards Medical Artificial General Intelligence via Knowledge-Enhanced
Multimodal Pretraining [121.89793208683625]
Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks.
We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
arXiv Detail & Related papers (2023-04-26T01:26:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.