Related papers: TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments

TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments

URL: http://arxiv.org/abs/2510.01179v1
Date: Wed, 01 Oct 2025 17:58:03 GMT
Title: TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
Authors: Zhangchen Xu, Adriana Meza Soria, Shawn Tan, Anurag Roy, Ashish Sunil Agrawal, Radha Poovendran, Rameswar Panda,
Abstract summary: Toucan is the largest publicly available tool-agentic dataset to date.<n>It generates diverse, realistic, and challenging tasks with trajectories involving real tool execution.
Score: 30.078263383249862
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Model (LLM) agents are rapidly emerging as powerful systems for automating tasks across domains. Yet progress in the open-source community is constrained by the lack of high quality permissively licensed tool-agentic training data. Existing datasets are often limited in diversity, realism, and complexity, particularly regarding multi-tool and multi-turn interactions. To address this gap, we introduce Toucan, the largest publicly available tool-agentic dataset to date, containing 1.5 million trajectories synthesized from nearly 500 real-world Model Context Protocols (MCPs). Unlike prior work, Toucan leverages authentic MCP environments to generate diverse, realistic, and challenging tasks with trajectories involving real tool execution. Our pipeline first produces a broad spectrum of tool-use queries using five distinct models, applies model-based quality filtering, and then generates agentic trajectories with three teacher models using two agentic frameworks. Rigorous rule-based and model-based validation ensures high-quality outputs. We also introduce three extension mechanisms to further diversify tasks and simulate multi-turn conversations. Models fine-tuned on Toucan outperform larger closed-source counterparts on the BFCL V3 benchmark and push the Pareto frontier forward on MCP-Universe Bench.

Related papers

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning [62.499592503950026]
Large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments.<n>We propose Agent World Model (AWM), a fully synthetic environment generation pipeline.<n>We scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets.
arXiv Detail & Related papers (2026-02-10T18:55:41Z)
Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling [46.593200463657645]
We present a comprehensive and fully open-source pipeline for training a high-performance agentic model, named Klear-Qwen3-AgentForge.<n>We design effective supervised fine-tuning (SFT) with synthetic data followed by multi-turn reinforcement learning (RL) to unlock the potential for multiple diverse agentic tasks.
arXiv Detail & Related papers (2025-11-08T09:47:27Z)
FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling [39.45732462111156]
We present FunReason-MT, a novel data synthesis framework for real-world multi-turn tool use.<n>FunReason-MT resolves the complexity barrier in multi-turn FC data by employing Environment-API Graph Interactions.<n>A 4B model built upon FunReason-MT generated data achieves state-of-the-art performance among comparable-sized models.
arXiv Detail & Related papers (2025-10-28T17:15:26Z)
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning [65.200259961515]
We develop a vision-centric agent tuning framework that automatically synthesizes multimodal trajectories.<n>We also introduce Pref-X, a set of 11K automatically generated preference pairs.<n>Across three benchmarks, Agent-X, GTA, and GAIA, MATRIX consistently surpasses both open- and closed-source VLMs.
arXiv Detail & Related papers (2025-10-09T17:59:54Z)
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay [86.01901238059261]
APIGen-MT is a framework that generates verifiable and diverse multi-turn agent data.<n>We train a family of models -- the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters.<n>Our models outperform frontier models such as GPT-4o and Claude 3.5 on $tau$-bench and BFCL benchmarks.
arXiv Detail & Related papers (2025-04-04T17:13:57Z)
OmniNova:A General Multimodal Agent Framework [0.5439020425819]
Large Language Models (LLMs) with specialized tools presents new opportunities for intelligent automation systems.<n>We present OmniNova, a modular multi-agent automation framework that combines language models with specialized tools such as web search, crawling, and code execution capabilities.
arXiv Detail & Related papers (2025-03-25T19:21:01Z)
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data [71.352883755806]
Multimodal embedding models have gained significant attention for their ability to map data from different modalities, such as text and images, into a unified representation space.<n>However, the limited labeled multimodal data often hinders embedding performance.<n>Recent approaches have leveraged data synthesis to address this problem, yet the quality of synthetic data remains a critical bottleneck.
arXiv Detail & Related papers (2025-02-12T15:03:33Z)
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage [75.76940471949366]
We propose a multi-modal agent tuning method that automatically generates multi-modal tool-usage data.<n>To preserve the data quality, we prompt the GPT-4o mini model to generate queries, files, and trajectories.<n> Evaluations show that the T3-Agent consistently achieves improvements on two popular VLMs.
arXiv Detail & Related papers (2024-12-20T07:00:46Z)
xLAM: A Family of Large Action Models to Empower AI Agent Systems [111.5719694445345]
We release xLAM, a series of large action models designed for AI agent tasks. xLAM consistently delivers exceptional performance across multiple agent ability benchmarks.
arXiv Detail & Related papers (2024-09-05T03:22:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.