Agentic Systems in Radiology: Design, Applications, Evaluation, and Challenges
- URL: http://arxiv.org/abs/2510.09404v2
- Date: Mon, 13 Oct 2025 07:11:22 GMT
- Title: Agentic Systems in Radiology: Design, Applications, Evaluation, and Challenges
- Authors: Christian Bluethgen, Dave Van Veen, Daniel Truhn, Jakob Nikolas Kather, Michael Moor, Malgorzata Polacin, Akshay Chaudhari, Thomas Frauenfelder, Curtis P. Langlotz, Michael Krauthammer, Farhad Nooralahzadeh,
- Abstract summary: Large language models (LLMs) are capable of using natural language to integrate information, follow instructions, and perform forms of "reasoning" and planning.<n>With its multimodal data streams and orchestrated spanning multiple systems, radiology is uniquely suited to benefit from agents that can adapt to context and automate repetitive yet complex tasks.<n>This review examines the design of such LLM agentic systems, highlights key applications, discusses evaluation methods for planning and tool use, and outlines challenges such as error cascades, tool-use efficiency, and health IT integration.
- Score: 13.53016942028838
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Building agents, systems that perceive and act upon their environment with a degree of autonomy, has long been a focus of AI research. This pursuit has recently become vastly more practical with the emergence of large language models (LLMs) capable of using natural language to integrate information, follow instructions, and perform forms of "reasoning" and planning across a wide range of tasks. With its multimodal data streams and orchestrated workflows spanning multiple systems, radiology is uniquely suited to benefit from agents that can adapt to context and automate repetitive yet complex tasks. In radiology, LLMs and their multimodal variants have already demonstrated promising performance for individual tasks such as information extraction and report summarization. However, using LLMs in isolation underutilizes their potential to support complex, multi-step workflows where decisions depend on evolving context from multiple information sources. Equipping LLMs with external tools and feedback mechanisms enables them to drive systems that exhibit a spectrum of autonomy, ranging from semi-automated workflows to more adaptive agents capable of managing complex processes. This review examines the design of such LLM-driven agentic systems, highlights key applications, discusses evaluation methods for planning and tool use, and outlines challenges such as error cascades, tool-use efficiency, and health IT integration.
Related papers
- Experience-Driven Multi-Agent Systems Are Training-free Context-aware Earth Observers [27.817039954088315]
We introduce textbfGeoEvolver, a self-evolving multi-agent system for learning tool-level expertise.<n>We show that GeoEvolver consistently improves end-to-end task success, with an average gain of 12% across multiple backbones.
arXiv Detail & Related papers (2026-01-30T15:11:07Z) - QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities [0.7519872646378835]
QUASAR is a universal autonomous system for atomistic simulation designed to facilitate production-grade scientific discovery.<n>We benchmark QUASAR against a series of three-tiered tasks, progressing from routine tasks to frontier research challenges such as photocatalyst screening and novel material assessment.<n>Results suggest that QUASAR can function as a general atomistic reasoning system rather than a task-specific automation framework.
arXiv Detail & Related papers (2026-01-30T05:29:44Z) - HELP: Hierarchical Embodied Language Planner for Household Tasks [75.38606213726906]
Embodied agents tasked with complex scenarios rely heavily on robust planning capabilities.<n>Large language models equipped with extensive linguistic knowledge can play this role.<n>We propose a Hierarchical Embodied Language Planner, called HELP, consisting of a set of LLM-based agents.
arXiv Detail & Related papers (2025-12-25T15:54:08Z) - Beyond Task Completion: An Assessment Framework for Evaluating Agentic AI Systems [0.0]
Recent advances in agentic AI have shifted the focus from standalone Large Language Models to integrated systems.<n>We propose an end-to-end Agent Assessment Framework with four evaluation pillars encompassing LLMs, Memory, Tools, and Environment.<n>We validate the framework on a representative Autonomous CloudOps use case, where experiments reveal behavioral deviations by conventional metrics.
arXiv Detail & Related papers (2025-12-14T18:17:40Z) - MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z) - Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models [75.4890331763196]
Recent breakthroughs in Large Language Models (LLMs) have led to the emergence of agentic AI systems.<n>LLM-based Agentic RS (LLM-ARS) can offer more interactive, context-aware, and proactive recommendations.
arXiv Detail & Related papers (2025-03-20T22:37:15Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks [74.52259252807191]
Multimodal Large Language Models (MLLMs) address the complexities of real-world applications far beyond the capabilities of single-modality systems.
This paper systematically sorts out the applications of MLLM in multimodal tasks such as natural language, vision, and audio.
arXiv Detail & Related papers (2024-08-02T15:14:53Z) - BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration [0.0]
We focus on designing a flexible agent engineering framework capable of handling complex use case applications across various domains.
The proposed framework provides reliability in industrial applications and presents techniques to ensure a scalable, flexible, and collaborative workflow for multiple autonomous agents.
arXiv Detail & Related papers (2024-06-28T16:39:20Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - TPTU: Large Language Model-based AI Agents for Task Planning and Tool
Usage [28.554981886052953]
Large Language Models (LLMs) have emerged as powerful tools for various real-world applications.
Despite their prowess, intrinsic generative abilities of LLMs may prove insufficient for handling complex tasks.
This paper proposes a structured framework tailored for LLM-based AI Agents.
arXiv Detail & Related papers (2023-08-07T09:22:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.