Which Tool Response Should I Trust? Tool-Expertise-Aware Chest X-ray Agent with Multimodal Agentic Learning
- URL: http://arxiv.org/abs/2602.21517v1
- Date: Wed, 25 Feb 2026 03:09:47 GMT
- Title: Which Tool Response Should I Trust? Tool-Expertise-Aware Chest X-ray Agent with Multimodal Agentic Learning
- Authors: Zheang Huai, Honglong Yang, Xiaomeng Li,
- Abstract summary: This paper introduces a framework that enables an agent to interact with tools and empirically learn their practical trustworthiness.<n>As a concrete instantiation, we focus on chest X-ray analysis and present a tool-expertise-aware chest X-ray agent.<n>When tool outputs disagree, the agent experimentally accepts or rejects multimodal tool results, receives rewards, and learns which tool to trust for each query type.
- Score: 11.117796080402044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI agents with tool-use capabilities show promise for integrating the domain expertise of various tools. In the medical field, however, tools are usually AI models that are inherently error-prone and can produce contradictory responses. Existing research on medical agents lacks sufficient understanding of the tools' realistic reliability and thus cannot effectively resolve tool conflicts. To address this gap, this paper introduces a framework that enables an agent to interact with tools and empirically learn their practical trustworthiness across different types of multimodal queries via agentic learning. As a concrete instantiation, we focus on chest X-ray analysis and present a tool-expertise-aware chest X-ray agent (TEA-CXA). When tool outputs disagree, the agent experimentally accepts or rejects multimodal tool results, receives rewards, and learns which tool to trust for each query type. Importantly, TEA-CXA extends existing codebases for reinforcement learning with multi-turn tool-calling that focus on textual inputs, to support multimodal contexts effectively. In addition, we enhance the codebase for medical use scenarios by supporting multiple tool calls in one turn, parallel tool inference, and multi-image accommodation within a single user query. Our code framework is applicable to general medical research on multi-turn tool-calling reinforcement learning in multimodal settings. Experiments show that TEA-CXA outperforms the state-of-the-art methods and a comprehensive set of baselines. Code will be released.
Related papers
- AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning [66.24374176797075]
We introduce textbfAdaReasoner, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior.<n>AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that prioritizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage.
arXiv Detail & Related papers (2026-01-26T16:04:43Z) - MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning [55.221850286246]
We introduce MindWatcher, a tool-integrated reasoning agent with interleaved thinking and multimodal chain-of-thought (CoT) reasoning.<n>MindWatcher can autonomously decide whether and how to invoke diverse tools and coordinate their use.<n>A large-scale, high-quality local image retrieval database, covering eight categories including cars, animals, and plants, endows model with robust object recognition.
arXiv Detail & Related papers (2025-12-29T12:16:12Z) - DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning [84.25936790759484]
We introduce DART, a multi-agent framework that uses disagreements between multiple debating visual agents to identify useful visual tools.<n>These tools allow for fruitful multi-agent discussion by introducing new information.<n>Dart adapts well to new tools in applied domains, with a 1.3% improvement on the M3D medical dataset.
arXiv Detail & Related papers (2025-12-08T03:33:38Z) - DeepAgent: A General Reasoning Agent with Scalable Toolsets [111.6384541877723]
DeepAgent is an end-to-end deep reasoning agent that performs autonomous thinking, tool discovery, and action execution.<n>To address the challenges of long-horizon interactions, we introduce an autonomous memory folding mechanism that compresses past interactions into structured episodic, working, and tool memories.<n>We develop an end-to-end reinforcement learning strategy, namely ToolPO, that leverages LLM-simulated APIs and applies tool-call advantage attribution to assign fine-grained credit to the tool invocation tokens.
arXiv Detail & Related papers (2025-10-24T16:24:01Z) - PaperArena: An Evaluation Benchmark for Tool-Augmented Agentic Reasoning on Scientific Literature [11.804526152911386]
We propose PaperArena, an evaluation benchmark for large language model (LLM) based agents to address real-world research questions.<n>Given a research question, agents should integrate diverse formats across multiple papers through reasoning and interacting with appropriate tools.<n> Experimental results reveal that even the most advanced LLM powering a well-established agent system achieves merely 38.78% average accuracy.
arXiv Detail & Related papers (2025-10-13T02:10:39Z) - Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z) - Reducing Cognitive Overhead in Tool Use via Multi-Small-Agent Reinforcement Learning [1.974921946982281]
We present MSARL, a framework that explicitly decouples reasoning from tool use.<n>In MSARL, a Reasoning Agent decomposes problems and plans tool invocations, while multiple Tool Agents specialize in specific external tools.<n>On mathematical problem solving with code execution, MSARL significantly improves reasoning stability and final-answer accuracy over single-agent baselines.
arXiv Detail & Related papers (2025-08-12T12:10:53Z) - T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search [51.91311158085973]
multimodal misinformation often arises from mixed forgery sources, requiring dynamic reasoning and adaptive verification.<n>We propose T2Agent, a novel misinformation detection agent that incorporates a toolkit with Monte Carlo Tree Search.<n>Extensive experiments show that T2Agent consistently outperforms existing baselines on challenging mixed-source multimodal misinformation benchmarks.
arXiv Detail & Related papers (2025-05-26T09:50:55Z) - LLM Agents Making Agent Tools [2.5529148902034637]
Tool use has turned large language models (LLMs) into powerful agents that can perform complex multi-step tasks.<n>But these tools must be implemented in advance by human developers.<n>We propose ToolMaker, an agentic framework that autonomously transforms papers with code into LLM-compatible tools.
arXiv Detail & Related papers (2025-02-17T11:44:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.