Can Modern LLMs Act as Agent Cores in Radiology Environments?
- URL: http://arxiv.org/abs/2412.09529v2
- Date: Thu, 19 Dec 2024 03:05:27 GMT
- Title: Can Modern LLMs Act as Agent Cores in Radiology Environments?
- Authors: Qiaoyu Zheng, Chaoyi Wu, Pengcheng Qiu, Lisong Dai, Ya Zhang, Yanfeng Wang, Weidi Xie,
- Abstract summary: Large language models (LLMs) offer enhanced accuracy and interpretability across various domains.
This paper aims to investigate the pre-requisite question for building concrete radiology agents.
We present RadABench-Data, a comprehensive synthetic evaluation dataset for LLM-based agents.
Second, we propose RadABench-EvalPlat, a novel evaluation platform for agents featuring a prompt-driven workflow.
- Score: 54.36730060680139
- License:
- Abstract: Advancements in large language models (LLMs) have paved the way for LLM-based agent systems that offer enhanced accuracy and interpretability across various domains. Radiology, with its complex analytical requirements, is an ideal field for the application of these agents. This paper aims to investigate the pre-requisite question for building concrete radiology agents which is, `Can modern LLMs act as agent cores in radiology environments?' To investigate it, we introduce RadABench with three-fold contributions: First, we present RadABench-Data, a comprehensive synthetic evaluation dataset for LLM-based agents, generated from an extensive taxonomy encompassing 6 anatomies, 5 imaging modalities, 10 tool categories, and 11 radiology tasks. Second, we propose RadABench-EvalPlat, a novel evaluation platform for agents featuring a prompt-driven workflow and the capability to simulate a wide range of radiology toolsets. Third, we assess the performance of 7 leading LLMs on our benchmark from 5 perspectives with multiple metrics. Our findings indicate that while current LLMs demonstrate strong capabilities in many areas, they are still not sufficiently advanced to serve as the central agent core in a fully operational radiology agent system. Additionally, we identify key factors influencing the performance of LLM-based agent cores, offering insights for clinicians on how to apply agent systems in real-world radiology practices effectively. All of our code and data are open-sourced in https://github.com/MAGIC-AI4Med/RadABench.
Related papers
- MedRAX: Medical Reasoning Agent for Chest X-ray [3.453950193734893]
Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care.
We present MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework.
arXiv Detail & Related papers (2025-02-04T19:31:00Z) - Enhancing LLMs for Impression Generation in Radiology Reports through a Multi-Agent System [10.502391082887568]
"RadCouncil" is a multi-agent Large Language Model (LLM) framework designed to enhance the generation of impressions in radiology reports from the finding section.
RadCouncil comprises three specialized agents: 1) a "Retrieval" Agent that identifies and retrieves similar reports from a vector database, 2) a "Radiologist" Agent that generates impressions based on the finding section of the given report plus the exemplar reports retrieved by the Retrieval Agent, and 3) a "Reviewer" Agent that evaluates the generated impressions and provides feedback.
arXiv Detail & Related papers (2024-12-06T21:33:03Z) - Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking [58.25862290294702]
We present MedChain, a dataset of 12,163 clinical cases that covers five key stages of clinical workflow.
We also propose MedChain-Agent, an AI system that integrates a feedback mechanism and a MCase-RAG module to learn from previous cases and adapt its responses.
arXiv Detail & Related papers (2024-12-02T15:25:02Z) - Best Practices for Large Language Models in Radiology [4.972411560978282]
Nuanced application of language is key for various activities.
The emergence of large language models (LLMs) offers an opportunity to improve the management and interpretation of the vast data in radiology.
arXiv Detail & Related papers (2024-12-02T07:54:55Z) - Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback [10.826651024680169]
Radiologists play a crucial role by translating medical images into medical reports.
While automated approaches using vision-language models (VLMs) show promise as assistants, they require exceptionally high accuracy.
We propose a scalable automated preference alignment technique for VLMs in radiology, focusing on chest X-ray (CXR) report generation.
arXiv Detail & Related papers (2024-10-09T16:07:11Z) - MGH Radiology Llama: A Llama 3 70B Model for Radiology [50.42811030970618]
This paper presents an advanced radiology-focused large language model: MGH Radiology Llama.
It is developed using the Llama 3 70B model, building upon previous domain-specific models like Radiology-GPT and Radiology-Llama2.
Our evaluation, incorporating both traditional metrics and a GPT-4-based assessment, highlights the enhanced performance of this work over general-purpose LLMs.
arXiv Detail & Related papers (2024-08-13T01:30:03Z) - Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs [22.568925103893182]
We aim to enhance the generalization capabilities of agents in open-ended text-based learning environments by integrating Reinforcement Learning (RL) with Large Language Models (LLMs)
We introduce PharmaSimText, a novel benchmark derived from the PharmaSim virtual pharmacy environment designed for practicing diagnostic conversations.
Our results show that RL-based agents excel in task completion but lack in asking quality diagnostic questions.
arXiv Detail & Related papers (2024-04-29T14:53:48Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.