MMedAgent: Learning to Use Medical Tools with Multi-modal Agent
- URL: http://arxiv.org/abs/2407.02483v1
- Date: Tue, 2 Jul 2024 17:58:23 GMT
- Title: MMedAgent: Learning to Use Medical Tools with Multi-modal Agent
- Authors: Binxu Li, Tiankai Yan, Yuanting Pan, Zhe Xu, Jie Luo, Ruiyang Ji, Shilong Liu, Haoyu Dong, Zihao Lin, Yixin Wang,
- Abstract summary: This paper introduces the first agent explicitly designed for the medical field, named textbfMulti-modal textbfMedical textbfAgent (MMedAgent)
We curate an instruction-tuning dataset comprising six medical tools solving seven tasks, enabling the agent to choose the most suitable tools for a given task.
MMedAgent achieves superior performance across a variety of medical tasks compared to state-of-the-art open-source methods and even the closed-source model, GPT-4o.
- Score: 26.315786330786676
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To bridge this gap, this paper introduces the first agent explicitly designed for the medical field, named \textbf{M}ulti-modal \textbf{Med}ical \textbf{Agent} (MMedAgent). We curate an instruction-tuning dataset comprising six medical tools solving seven tasks, enabling the agent to choose the most suitable tools for a given task. Comprehensive experiments demonstrate that MMedAgent achieves superior performance across a variety of medical tasks compared to state-of-the-art open-source methods and even the closed-source model, GPT-4o. Furthermore, MMedAgent exhibits efficiency in updating and integrating new medical tools.
Related papers
- EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms [55.77492625524141]
EvoAgent is a generic method to automatically extend expert agents to multi-agent systems via the evolutionary algorithm.
We show that EvoAgent can automatically generate multiple expert agents and significantly enhance the task-solving capabilities of LLM-based agents.
arXiv Detail & Related papers (2024-06-20T11:49:23Z) - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [116.97648507802926]
Large language models (LLMs) are considered a promising foundation to build such agents.
We take the first step towards building generally-capable LLM-based agents with self-evolution ability.
We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration.
arXiv Detail & Related papers (2024-06-06T15:15:41Z) - Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning [21.562034852024272]
The adoption of large language models (LLMs) in healthcare has attracted significant research interest.
Most state-of-the-art LLMs are unimodal, text-only models that cannot directly process multimodal inputs.
We propose a multimodal medical collaborative reasoning framework textbfMultiMedRes to solve medical multimodal reasoning problems.
arXiv Detail & Related papers (2024-05-19T18:26:11Z) - Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models [17.643421997037514]
We propose a novel framework that tackles both discriminative and generative multimodal medical tasks.
The learning of Med-MoE consists of three steps: multimodal medical alignment, instruction tuning and routing, and domain-specific MoE tuning.
Our model can achieve performance superior to or on par with state-of-the-art baselines.
arXiv Detail & Related papers (2024-04-16T02:35:17Z) - Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology [0.6397820821509177]
We introduce an alternative approach to multimodal medical AI that utilizes the generalist capabilities of a large language model (LLM) as a central reasoning engine.
This engine autonomously coordinates and deploys a set of specialized medical AI tools.
We show that the system has a high capability in employing appropriate tools (97%), drawing correct conclusions (93.6%), and providing complete (94%), and helpful (89.2%) recommendations for individual patient cases.
arXiv Detail & Related papers (2024-04-06T15:50:19Z) - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities.
When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - EHRAgent: Code Empowers Large Language Models for Few-shot Complex
Tabular Reasoning on Electronic Health Records [48.9788858020623]
Large language models (LLMs) have demonstrated exceptional capabilities in planning and tool utilization.
We propose EHRAgent, an LLM agent empowered with a code interface, to autonomously generate and execute code for multi-tabular reasoning.
arXiv Detail & Related papers (2024-01-13T18:09:05Z) - MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning [35.804520192679874]
Large language models (LLMs) encounter significant barriers in medicine and healthcare.
We propose MedAgents, a novel multi-disciplinary collaboration framework for the medical domain.
Our work focuses on the zero-shot setting, which is applicable in real-world scenarios.
arXiv Detail & Related papers (2023-11-16T11:47:58Z) - Towards Medical Artificial General Intelligence via Knowledge-Enhanced
Multimodal Pretraining [121.89793208683625]
Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks.
We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
arXiv Detail & Related papers (2023-04-26T01:26:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.