Related papers: AuTAgent: A Reinforcement Learning Framework for Tool-Augmented Audio Reasoning

AuTAgent: A Reinforcement Learning Framework for Tool-Augmented Audio Reasoning

URL: http://arxiv.org/abs/2602.13685v1
Date: Sat, 14 Feb 2026 09:12:20 GMT
Title: AuTAgent: A Reinforcement Learning Framework for Tool-Augmented Audio Reasoning
Authors: Siqian Tong, Xuan Li, Yiwei Wang, Baolong Bi, Yujun Cai, Shenghua Liu, Yuchen He, Chengpeng Hao,
Abstract summary: Large Audio Language Models (LALMs) excel at perception but struggle with complex reasoning requiring precise acoustic measurements.<n>We propose AuTAgent, a reinforcement learning framework that learns when and which tools to invoke.
Score: 36.67330306977483
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Audio Language Models (LALMs) excel at perception but struggle with complex reasoning requiring precise acoustic measurements. While external tools can extract fine-grained features like exact tempo or pitch, effective integration remains challenging: naively using all tools causes information overload, while prompt-based selection fails to assess context-dependent utility. To address this, we propose AuTAgent (Audio Tool Agent), a reinforcement learning framework that learns when and which tools to invoke. By employing a sparse-feedback training strategy with a novel Differential Reward mechanism, the agent learns to filter out irrelevant tools and invokes external assistance only when it yields a net performance gain over the base model. Experimental results confirm that AuTAgent complements the representation bottleneck of LALMs by providing verifiable acoustic evidence. It improves accuracy by 4.20% / 6.20% and 9.80% / 8.00% for open-source and closed-source backbones on the MMAU Test-mini and the MMAR benchmarks, respectively. In addition, further experiments demonstrate exceptional transferability. We highlight the complementary role of external tools in augmenting audio model reasoning.

Related papers

AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning [29.443084496227026]
Large Audio Language Models (LALMs) have demonstrated strong capabilities in audio understanding and reasoning.<n>We propose Audio, a reinforcement learning framework that enables LALMs to improve audio understanding by learning when and how to use external audio tools.
arXiv Detail & Related papers (2026-02-11T02:30:48Z)
Evolving from Tool User to Creator via Training-Free Experience Reuse in Multimodal Reasoning [16.12114923351562]
We propose a training-free framework that transforms agents from tool users to tool creators.<n>This approach harvests reasoning experiences and distills them into reusable assets.<n>We also introduce a memory consolidation mechanism to maintain the tool library.
arXiv Detail & Related papers (2026-02-02T11:37:45Z)
One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning [54.580646706013965]
Reward models (RMs) play a critical role in aligning large language models with human preferences.<n>We introduce ToolRM, a family of lightweight generative RMs tailored for general tool-use scenarios.<n>To build these models, we propose a novel pipeline that constructs pairwise preference data using rule-based scoring and multidimensional sampling.
arXiv Detail & Related papers (2025-10-30T06:08:27Z)
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments [70.42705564227548]
We propose an automated environment construction pipeline for large language models (LLMs)<n>This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools.<n>We also introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution.
arXiv Detail & Related papers (2025-08-12T09:45:19Z)
AutoTIR: Autonomous Tools Integrated Reasoning via Reinforcement Learning [17.086082843274003]
Large Language Models (LLMs) evolve into powerful Large Reasoning Models (LRMs)<n>Tool-Integrated Reasoning (TIR) further extends their capabilities by incorporating external tools.<n>Inspired by the human ability to adaptively select tools, we introduce AutoTIR, a reinforcement learning framework.
arXiv Detail & Related papers (2025-07-29T14:12:28Z)
Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning [63.2198957755528]
We propose Tool-MVR, a novel Tool-Augmented LLM that achieves comprehensive System 2 reasoning through two key innovations.<n>Specifically, we first introduce Multi-Agent Meta-Verification (MAMV), a systematic pipeline that rigorously validates APIs, queries, and reasoning trajectories.<n>Second, we propose Exploration-based Reflection Learning (EXPLORE), which enhances tool reflection capabilities by leveraging tool feedback.
arXiv Detail & Related papers (2025-06-05T04:35:49Z)
Can We Trust Machine Learning? The Reliability of Features from Open-Source Speech Analysis Tools for Speech Modeling [0.0]
Machine learning-based behavioral models rely on features extracted from audio-visual recordings.<n>Machine learning tools often lack validation to ensure reliability in capturing behaviorally relevant information.<n>We evaluate speech features extracted from two widely used speech analysis tools, OpenSMILE and Praat, to assess their reliability when considering adolescents with autism.
arXiv Detail & Related papers (2025-06-02T18:55:53Z)
Acting Less is Reasoning More! Teaching Model to Act Efficiently [87.28134636548705]
Tool-integrated reasoning augments large language models with the ability to invoke external tools to solve tasks.<n>Current approaches typically optimize only for final correctness without considering the efficiency or necessity of external tool use.<n>We propose a framework that encourages models to produce accurate answers with minimal tool calls.<n>Our approach reduces tool calls by up to 68.3% and improves tool productivity by up to 215.4%, while maintaining comparable answer accuracy.
arXiv Detail & Related papers (2025-04-21T05:40:05Z)
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.<n>MeCo quantifies metacognitive scores by capturing high-level cognitive signals in the representation space.<n>MeCo is fine-tuning-free and incurs minimal cost.
arXiv Detail & Related papers (2025-02-18T15:45:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.