Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text
- URL: http://arxiv.org/abs/2601.10355v1
- Date: Thu, 15 Jan 2026 12:58:46 GMT
- Title: Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text
- Authors: Zhihao Xu, Rumei Li, Jiahuan Li, Rongxiang Weng, Jingang Wang, Xunliang Cai, Xiting Wang,
- Abstract summary: We introduce GEM, a data synthesis pipeline that enables the generation and extraction of multi-turn tool-use trajectories from text corpora.<n>To reduce the computational cost, we further train a specialized Trajectory Synthesizer via supervised fine-tuning.<n>Experiments demonstrate that our GEM-32B achieve a 16.5% improvement on the BFCL V3 Multi-turn benchmark.
- Score: 48.25052564552558
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Enabling Large Language Models (LLMs) to effectively utilize tools in multi-turn interactions is essential for building capable autonomous agents. However, acquiring diverse and realistic multi-turn tool-use data remains a significant challenge. In this work, we propose a novel text-based paradigm. We observe that textual corpora naturally contain rich, multi-step problem-solving experiences, which can serve as an untapped, scalable, and authentic data source for multi-turn tool-use tasks. Based on this insight, we introduce GEM, a data synthesis pipeline that enables the generation and extraction of multi-turn tool-use trajectories from text corpora through a four-stage process: relevance filtering, workflow & tool extraction, trajectory grounding, and complexity refinement. To reduce the computational cost, we further train a specialized Trajectory Synthesizer via supervised fine-tuning. This model distills the complex generation pipeline into an efficient, end-to-end trajectory generator. Experiments demonstrate that our GEM-32B achieve a 16.5% improvement on the BFCL V3 Multi-turn benchmark. Our models partially surpass the performance of models trained on τ - bench (Airline and Retail) in-domain data, highlighting the superior generalization capability derived from our text-based synthesis paradigm. Notably, our Trajectory Synthesizer matches the quality of the full pipeline while significantly reducing inference latency and costs.
Related papers
- From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents [23.583947864141162]
EigenData is a hierarchical multi-agent engine that synthesizes tool-grounded dialogues together with executable per-instance checkers.<n>Building on the synthetic data, we develop an RL recipe that first fine-tunes the user model and then applies GRPO-style training.<n>Our results suggest a scalable pathway for bootstrapping complex tool-using behaviors without expensive human annotation.
arXiv Detail & Related papers (2026-01-30T06:01:23Z) - ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas [13.919124676472022]
ASTRA is an end-to-end framework for training tool-augmented language model agents.<n>ASTRA integrates scalable data synthesis and verifiable reinforcement learning.<n> Experiments on multiple agentic tool-use benchmarks demonstrate that ASTRA-trained models achieve state-of-the-art performance.
arXiv Detail & Related papers (2026-01-29T11:22:23Z) - Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing [16.839489120513505]
InfTool orchestrates three collaborative agents to generate diverse, verified trajectories spanning single-turn calls to complex multi-step gated calls.<n>We show that InfTool transforms a base 32B model from 19.8% to 70.9% accuracy (+258%), surpassing models 10x larger and rivaling Claude-Opus.
arXiv Detail & Related papers (2025-12-29T17:12:39Z) - FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling [39.45732462111156]
We present FunReason-MT, a novel data synthesis framework for real-world multi-turn tool use.<n>FunReason-MT resolves the complexity barrier in multi-turn FC data by employing Environment-API Graph Interactions.<n>A 4B model built upon FunReason-MT generated data achieves state-of-the-art performance among comparable-sized models.
arXiv Detail & Related papers (2025-10-28T17:15:26Z) - LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training [55.72784274656801]
We introduce a scalable paradigm that generates structured UI states and transitions to synthesize training trajectories at scale.<n>Our paradigm integrates a digital world simulator for diverse UI states, a guided rollout process for coherent exploration, and a trajectory wrapper.<n>Experiments on WebArena and AndroidWorld show that UI-Simulator rivals or surpasses open-source agents trained on real UIs with significantly better robustness.
arXiv Detail & Related papers (2025-10-16T17:59:38Z) - SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis [94.33978856270268]
Retrieval-augmented generation (RAG) systems have advanced large language models (LLMs) in complex deep search scenarios.<n>Existing approaches face critical limitations that lack high-quality training trajectories and suffer from distributional mismatches.<n>This paper introduces SimpleDeepSearcher, a framework that bridges the gap through strategic data engineering rather than complex training paradigms.
arXiv Detail & Related papers (2025-05-22T16:05:02Z) - Procedural Environment Generation for Tool-Use Agents [55.10427063893754]
We introduce RandomWorld, a pipeline for the procedural generation of interactive tools and compositional tool-use data.<n>We show that models tuned via SFT and RL on synthetic RandomWorld data improve on a range of tool-use benchmarks.
arXiv Detail & Related papers (2025-05-21T14:10:06Z) - Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning [68.00304954972232]
Multimodal agents, which integrate a controller e.g., a vision language model, with external tools, have demonstrated remarkable capabilities in tackling complex multimodal tasks.<n>Existing approaches for training these agents depend on extensive human-annotated task-answer pairs and tool trajectories.<n>We propose an iterative tool usage exploration method for multimodal agents without any pre-collected data, namely SPORT.<n>SPORT has four iterative components: task synthesis, step sampling, step verification, and preference tuning.
arXiv Detail & Related papers (2025-04-30T12:01:27Z) - ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis [80.34000499166648]
We propose a Graph-based Sampling strategy to sample more relevant tool combinations, and a Planned-generation strategy to create plans that guide the synthesis of coherent dialogues.<n>We apply SFT on LLaMA-3.1-8B using 8,000 synthetic dialogues generated with ToolFlow.<n>Results show that the model achieves tool-calling performance comparable to or even surpassing GPT-4, while maintaining strong general capabilities.
arXiv Detail & Related papers (2024-10-24T05:45:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.