MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
- URL: http://arxiv.org/abs/2311.10537v4
- Date: Tue, 4 Jun 2024 23:47:43 GMT
- Title: MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
- Authors: Xiangru Tang, Anni Zou, Zhuosheng Zhang, Ziming Li, Yilun Zhao, Xingyao Zhang, Arman Cohan, Mark Gerstein,
- Abstract summary: Large language models (LLMs) encounter significant barriers in medicine and healthcare.
We propose MedAgents, a novel multi-disciplinary collaboration framework for the medical domain.
Our work focuses on the zero-shot setting, which is applicable in real-world scenarios.
- Score: 35.804520192679874
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs), despite their remarkable progress across various general domains, encounter significant barriers in medicine and healthcare. This field faces unique challenges such as domain-specific terminologies and reasoning over specialized knowledge. To address these issues, we propose MedAgents, a novel multi-disciplinary collaboration framework for the medical domain. MedAgents leverages LLM-based agents in a role-playing setting that participate in a collaborative multi-round discussion, thereby enhancing LLM proficiency and reasoning capabilities. This training-free framework encompasses five critical steps: gathering domain experts, proposing individual analyses, summarising these analyses into a report, iterating over discussions until a consensus is reached, and ultimately making a decision. Our work focuses on the zero-shot setting, which is applicable in real-world scenarios. Experimental results on nine datasets (MedQA, MedMCQA, PubMedQA, and six subtasks from MMLU) establish that our proposed MedAgents framework excels at mining and harnessing the medical expertise within LLMs, as well as extending its reasoning abilities. Our code can be found at https://github.com/gersteinlab/MedAgents.
Related papers
- MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration [16.062646854608094]
Large Language Model (LLM)-driven interactive systems currently show potential promise in healthcare domains.
This paper proposes MedAide, an omni medical multi-agent collaboration framework for specialized healthcare services.
arXiv Detail & Related papers (2024-10-16T13:10:27Z) - Towards Evaluating and Building Versatile Large Language Models for Medicine [57.49547766838095]
We present MedS-Bench, a benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts.
MedS-Bench spans 11 high-level clinical tasks, including clinical report summarization, treatment recommendations, diagnosis, named entity recognition, and medical concept explanation.
MedS-Ins comprises 58 medically oriented language corpora, totaling 13.5 million samples across 122 tasks.
arXiv Detail & Related papers (2024-08-22T17:01:34Z) - TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models [22.76485170022542]
We introduce a new medical question-answering (QA) dataset that contains massive manual instruction for solving Traditional Chinese Medicine examination tasks.
Our TCMD collects massive questions across diverse domains with their annotated medical subjects.
arXiv Detail & Related papers (2024-06-07T13:48:15Z) - Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning [21.562034852024272]
The adoption of large language models (LLMs) in healthcare has attracted significant research interest.
Most state-of-the-art LLMs are unimodal, text-only models that cannot directly process multimodal inputs.
We propose a multimodal medical collaborative reasoning framework textbfMultiMedRes to solve medical multimodal reasoning problems.
arXiv Detail & Related papers (2024-05-19T18:26:11Z) - COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain [1.6752458252726457]
Large Language Models (LLMs) constitute a breakthrough state-of-the-art Artificial Intelligence (AI) technology.
We outline Cognitive Network Evaluation Toolkit for Medical Domains (COGNET-MD)
We propose a scoring-framework with increased difficulty to assess the ability of LLMs in interpreting medical text.
arXiv Detail & Related papers (2024-05-17T16:31:56Z) - MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making [45.74980058831342]
We introduce a novel multi-agent framework, named Medical Decision-making Agents (MDAgents)
The assigned solo or group collaboration structure is tailored to the medical task at hand, emulating real-world medical decision-making processes.
MDAgents achieved the best performance in seven out of ten benchmarks on tasks requiring an understanding of medical knowledge.
arXiv Detail & Related papers (2024-04-22T06:30:05Z) - Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents.
There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain.
This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z) - Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large
Language Models [59.60384461302662]
We introduce Asclepius, a novel benchmark for evaluating Medical Multi-Modal Large Language Models (Med-MLLMs)
Asclepius rigorously and comprehensively assesses model capability in terms of distinct medical specialties and different diagnostic capacities.
We also provide an in-depth analysis of 6 Med-MLLMs and compare them with 5 human specialists.
arXiv Detail & Related papers (2024-02-17T08:04:23Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - A Survey of Large Language Models in Medicine: Progress, Application, and Challenge [85.09998659355038]
Large language models (LLMs) have received substantial attention due to their capabilities for understanding and generating human language.
This review aims to provide a detailed overview of the development and deployment of LLMs in medicine.
arXiv Detail & Related papers (2023-11-09T02:55:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.