AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale
Clinical Tool Learning
- URL: http://arxiv.org/abs/2402.13225v1
- Date: Tue, 20 Feb 2024 18:37:19 GMT
- Title: AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale
Clinical Tool Learning
- Authors: Qiao Jin, Zhizheng Wang, Yifan Yang, Qingqing Zhu, Donald Wright,
Thomas Huang, W John Wilbur, Zhe He, Andrew Taylor, Qingyu Chen, Zhiyong Lu
- Abstract summary: We introduce AgentMD, a novel language agent capable of curating and applying clinical calculators across various clinical contexts.
AgentMD has automatically curated a collection of 2,164 diverse clinical calculators with executable functions and structured documentation, collectively named RiskCalcs.
Manual evaluations show that RiskCalcs tools achieve an accuracy of over 80% on three quality metrics.
- Score: 11.8292941452582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinical calculators play a vital role in healthcare by offering accurate
evidence-based predictions for various purposes such as prognosis.
Nevertheless, their widespread utilization is frequently hindered by usability
challenges, poor dissemination, and restricted functionality. Augmenting large
language models with extensive collections of clinical calculators presents an
opportunity to overcome these obstacles and improve workflow efficiency, but
the scalability of the manual curation process poses a significant challenge.
In response, we introduce AgentMD, a novel language agent capable of curating
and applying clinical calculators across various clinical contexts. Using the
published literature, AgentMD has automatically curated a collection of 2,164
diverse clinical calculators with executable functions and structured
documentation, collectively named RiskCalcs. Manual evaluations show that
RiskCalcs tools achieve an accuracy of over 80% on three quality metrics. At
inference time, AgentMD can automatically select and apply the relevant
RiskCalcs tools given any patient description. On the newly established RiskQA
benchmark, AgentMD significantly outperforms chain-of-thought prompting with
GPT-4 (87.7% vs. 40.9% in accuracy). Additionally, we also applied AgentMD to
real-world clinical notes for analyzing both population-level and risk-level
patient characteristics. In summary, our study illustrates the utility of
language agents augmented with clinical calculators for healthcare analytics
and patient care.
Related papers
- ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents [22.596827147978598]
Large Language Models (LLMs) have shown promising potential in the medical domain.
ClinicalAgent Bench(CAB) is a comprehensive medical agent benchmark consisting of 18 tasks across five key realistic clinical dimensions.
ReflecTool is a novel framework that excels at utilizing domain-specific tools within two stages.
arXiv Detail & Related papers (2024-10-23T08:19:18Z) - AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments [2.567146936147657]
We introduce AgentClinic, a multimodal agent benchmark for evaluating large language models (LLM) in simulated clinical environments.
We find that solving MedQA problems in the sequential decision-making format of AgentClinic is considerably more challenging, resulting in diagnostic accuracies that can drop to below a tenth of the original accuracy.
arXiv Detail & Related papers (2024-05-13T17:38:53Z) - Advancing Healthcare Automation: Multi-Agent System for Medical Necessity Justification [0.0]
This paper explores the application of Multi-Agent System (MAS) that utilize specialized LLM agents to automate Prior Authorization task.
We demonstrate that GPT-4 checklist achieves an accuracy of 86.2% in predicting item-level judgments with evidence, and 95.6% in determining overall checklist judgment.
arXiv Detail & Related papers (2024-04-27T18:40:05Z) - ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning [16.04933261211837]
Large Language Models (LLMs) and multi-agent systems have shown impressive capabilities in natural language tasks but face challenges in clinical trial applications.
We introduce Clinical Agent System (ClinicalAgent), a clinical multi-agent system designed for clinical trial tasks.
arXiv Detail & Related papers (2024-04-23T06:30:53Z) - Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology [0.6397820821509177]
We introduce an alternative approach to multimodal medical AI that utilizes the generalist capabilities of a large language model (LLM) as a central reasoning engine.
This engine autonomously coordinates and deploys a set of specialized medical AI tools.
We show that the system has a high capability in employing appropriate tools (97%), drawing correct conclusions (93.6%), and providing complete (94%), and helpful (89.2%) recommendations for individual patient cases.
arXiv Detail & Related papers (2024-04-06T15:50:19Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Human Evaluation and Correlation with Automatic Metrics in Consultation
Note Generation [56.25869366777579]
In recent years, machine learning models have rapidly become better at generating clinical consultation notes.
We present an extensive human evaluation study where 5 clinicians listen to 57 mock consultations, write their own notes, post-edit a number of automatically generated notes, and extract all the errors.
We find that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore.
arXiv Detail & Related papers (2022-04-01T14:04:16Z) - Active learning for medical code assignment [55.99831806138029]
We demonstrate the effectiveness of Active Learning (AL) in multi-label text classification in the clinical domain.
We apply a set of well-known AL methods to help automatically assign ICD-9 codes on the MIMIC-III dataset.
Our results show that the selection of informative instances provides satisfactory classification with a significantly reduced training set.
arXiv Detail & Related papers (2021-04-12T18:11:17Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.