Procedural Environment Generation for Tool-Use Agents
- URL: http://arxiv.org/abs/2506.11045v1
- Date: Wed, 21 May 2025 14:10:06 GMT
- Title: Procedural Environment Generation for Tool-Use Agents
- Authors: Michael Sullivan, Mareike Hartmann, Alexander Koller,
- Abstract summary: We introduce RandomWorld, a pipeline for the procedural generation of interactive tools and compositional tool-use data.<n>We show that models tuned via SFT and RL on synthetic RandomWorld data improve on a range of tool-use benchmarks.
- Score: 55.417058694785325
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although the power of LLM tool-use agents has ignited a flurry of recent research in this area, the curation of tool-use training data remains an open problem$-$especially for online RL training. Existing approaches to synthetic tool-use data generation tend to be non-interactive, and/or non-compositional. We introduce RandomWorld, a pipeline for the procedural generation of interactive tools and compositional tool-use data. We show that models tuned via SFT and RL on synthetic RandomWorld data improve on a range of tool-use benchmarks, and set the new SoTA for two metrics on the NESTFUL dataset. Further experiments show that downstream performance scales with the amount of RandomWorld-generated training data, opening up the possibility of further improvement through the use of entirely synthetic data.
Related papers
- ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients" [53.7887350405379]
Prior work synthesizes tool-use LLM datasets by first generating a user query, followed by complex tool-use annotations like DFS.<n>We introduce ToolGrad, an agentic framework that inverts this paradigm. ToolGrad first constructs valid tool-use chains through an iterative process guided by textual "gradients"<n>This "answer-first" approach led to ToolGrad-5k, a dataset generated with more complex tool use, lower cost, and 100% pass rate.
arXiv Detail & Related papers (2025-08-06T05:04:00Z) - Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning [63.31585771716123]
Large language models (LLMs) have shown remarkable reasoning capabilities via large-scale reinforcement learning (RL)<n>We introduce Tool-Star, an RL-based framework designed to empower LLMs to autonomously invoke multiple external tools during stepwise reasoning.<n>Tool-Star integrates six types of tools and incorporates systematic designs in both data synthesis and training.
arXiv Detail & Related papers (2025-05-22T09:00:19Z) - Synthline: A Product Line Approach for Synthetic Requirements Engineering Data Generation using Large Language Models [0.5156484100374059]
This paper introduces Synthline, a Product Line (PL) approach that leverages Large Language Models to generate synthetic Requirements Engineering (RE) data.<n>Our analysis reveals that while synthetic datasets exhibit less diversity than real data, they are good enough to serve as viable training resources.<n>Our evaluation shows that combining synthetic and real data leads to substantial performance improvements.
arXiv Detail & Related papers (2025-05-06T07:57:16Z) - ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis [80.34000499166648]
We propose a Graph-based Sampling strategy to sample more relevant tool combinations, and a Planned-generation strategy to create plans that guide the synthesis of coherent dialogues.<n>We apply SFT on LLaMA-3.1-8B using 8,000 synthetic dialogues generated with ToolFlow.<n>Results show that the model achieves tool-calling performance comparable to or even surpassing GPT-4, while maintaining strong general capabilities.
arXiv Detail & Related papers (2024-10-24T05:45:04Z) - Generative Expansion of Small Datasets: An Expansive Graph Approach [13.053285552524052]
We introduce an Expansive Synthesis model generating large-scale, information-rich datasets from minimal samples.
An autoencoder with self-attention layers and optimal transport refines distributional consistency.
Results show comparable performance, demonstrating the model's potential to augment training data effectively.
arXiv Detail & Related papers (2024-06-25T02:59:02Z) - Are Synthetic Time-series Data Really not as Good as Real Data? [29.852306720544224]
Time-series data presents limitations stemming from data quality issues, bias and vulnerabilities, and generalization problem.
We introduce InfoBoost -- a highly versatile cross-domain data synthesizing framework with time series representation learning capability.
We have developed a method based on synthetic data that enables model training without the need for real data, surpassing the performance of models trained with real data.
arXiv Detail & Related papers (2024-02-01T13:59:04Z) - Data-driven prediction of tool wear using Bayesian-regularized
artificial neural networks [8.21266434543609]
The prediction of tool wear helps minimize costs and enhance product quality in manufacturing.
We propose a new data-driven model that uses Bayesian Regularized Artificial Neural Networks (BRANNs) to precisely predict milling tool wear.
arXiv Detail & Related papers (2023-11-30T15:22:20Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Bridging the Gap: Enhancing the Utility of Synthetic Data via
Post-Processing Techniques [7.967995669387532]
generative models have emerged as a promising solution for generating synthetic datasets that can replace or augment real-world data.
We propose three novel post-processing techniques to improve the quality and diversity of the synthetic dataset.
Experiments show that Gap Filler (GaFi) effectively reduces the gap with real-accuracy scores to an error of 2.03%, 1.78%, and 3.99% on the Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets, respectively.
arXiv Detail & Related papers (2023-05-17T10:50:38Z) - Scalable Modular Synthetic Data Generation for Advancing Aerial Autonomy [2.9005223064604078]
We introduce a scalable Aerial Synthetic Data Augmentation (ASDA) framework tailored to aerial autonomy applications.
ASDA extends a central data collection engine with two scriptable pipelines that automatically perform scene and data augmentations.
We demonstrate the effectiveness of our method in automatically generating diverse datasets.
arXiv Detail & Related papers (2022-11-10T04:37:41Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.