Related papers: ToolMind Technical Report: A Large-Scale, Reasoning-Enhanced Tool-Use Dataset

ToolMind Technical Report: A Large-Scale, Reasoning-Enhanced Tool-Use Dataset

URL: http://arxiv.org/abs/2511.15718v1
Date: Wed, 12 Nov 2025 13:01:23 GMT
Title: ToolMind Technical Report: A Large-Scale, Reasoning-Enhanced Tool-Use Dataset
Authors: Chen Yang, Ran Le, Yun Xing, Zhenwei An, Zongchao Chen, Wayne Xin Zhao, Yang Song, Tao Zhang,
Abstract summary: We introduce ToolMind, a high-quality tool-agentic dataset with 160k synthetic data instances.<n>We employ fine-grained turn-level filtering to remove erroneous or suboptimal steps.<n>Models fine-tuned on ToolMind show significant improvements over baselines on several benchmarks.
Score: 43.45582911794623
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Model (LLM) agents have developed rapidly in recent years to solve complex real-world problems using external tools. However, the scarcity of high-quality trajectories still hinders the development of stronger LLM agents. Most existing works on multi-turn dialogue synthesis validate correctness only at the trajectory level, which may overlook turn-level errors that can propagate during training and degrade model performance. To address these limitations, we introduce ToolMind, a large-scale, high-quality tool-agentic dataset with 160k synthetic data instances generated using over 20k tools and 200k augmented open-source data instances. Our data synthesis pipeline first constructs a function graph based on parameter correlations and then uses a multi-agent framework to simulate realistic user-assistant-tool interactions. Beyond trajectory-level validation, we employ fine-grained turn-level filtering to remove erroneous or suboptimal steps, ensuring that only high-quality reasoning traces are retained. This approach mitigates error amplification during training while preserving self-corrective reasoning signals essential for robust tool-use learning. Models fine-tuned on ToolMind show significant improvements over baselines on several benchmarks.

Related papers

From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents [23.583947864141162]
EigenData is a hierarchical multi-agent engine that synthesizes tool-grounded dialogues together with executable per-instance checkers.<n>Building on the synthetic data, we develop an RL recipe that first fine-tunes the user model and then applies GRPO-style training.<n>Our results suggest a scalable pathway for bootstrapping complex tool-using behaviors without expensive human annotation.
arXiv Detail & Related papers (2026-01-30T06:01:23Z)
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas [13.919124676472022]
ASTRA is an end-to-end framework for training tool-augmented language model agents.<n>ASTRA integrates scalable data synthesis and verifiable reinforcement learning.<n> Experiments on multiple agentic tool-use benchmarks demonstrate that ASTRA-trained models achieve state-of-the-art performance.
arXiv Detail & Related papers (2026-01-29T11:22:23Z)
Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing [16.839489120513505]
InfTool orchestrates three collaborative agents to generate diverse, verified trajectories spanning single-turn calls to complex multi-step gated calls.<n>We show that InfTool transforms a base 32B model from 19.8% to 70.9% accuracy (+258%), surpassing models 10x larger and rivaling Claude-Opus.
arXiv Detail & Related papers (2025-12-29T17:12:39Z)
One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning [54.580646706013965]
Reward models (RMs) play a critical role in aligning large language models with human preferences.<n>We introduce ToolRM, a family of lightweight generative RMs tailored for general tool-use scenarios.<n>To build these models, we propose a novel pipeline that constructs pairwise preference data using rule-based scoring and multidimensional sampling.
arXiv Detail & Related papers (2025-10-30T06:08:27Z)
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning [25.605096023894834]
Large Language Models (LLMs) have made remarkable progress in mathematical reasoning.<n>Despite recent advances, existing methods struggle with three key challenges.<n>We propose THOR (Tool-Integrated Hierarchical Optimization via RL) to overcome these limitations.<n>Our approach demonstrates strong generalization across diverse models, performing effectively in both reasoning and non-reasoning models.
arXiv Detail & Related papers (2025-09-17T07:16:12Z)
RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use [50.52940111891476]
Large language models excel at basic reasoning but struggle with tasks that require interaction with external tools.<n>We present RLFactory, a plug-and-play reinforcement learning framework for multi-round tool use.
arXiv Detail & Related papers (2025-08-31T16:47:31Z)
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments [70.42705564227548]
We propose an automated environment construction pipeline for large language models (LLMs)<n>This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools.<n>We also introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution.
arXiv Detail & Related papers (2025-08-12T09:45:19Z)
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.<n>MeCo quantifies metacognitive scores by capturing high-level cognitive signals in the representation space.<n>MeCo is fine-tuning-free and incurs minimal cost.
arXiv Detail & Related papers (2025-02-18T15:45:01Z)
iTool: Reinforced Fine-Tuning with Dynamic Deficiency Calibration for Advanced Tool Use [56.31110409360567]
Augmenting large language models with external tools is a promising approach to enhance their capabilities.<n>We show that training gains significantly decay as synthetic data increases.<n>We propose an iterative reinforced fine-tuning strategy designed to alleviate this limitation.
arXiv Detail & Related papers (2025-01-15T04:52:34Z)
TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use [72.32614703504122]
Large language models (LLMs) achieve remarkable advancements by leveraging tools to interact with environments.<n>Standard supervised fine-tuning approach, which relies on large-scale datasets, often overlooks task-specific characteristics in tool use.<n>We propose TL-Training, a task-feature-based framework that mitigates the effects of suboptimal training data.
arXiv Detail & Related papers (2024-12-20T02:21:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.