Simulating Environments with Reasoning Models for Agent Training
- URL: http://arxiv.org/abs/2511.01824v1
- Date: Mon, 03 Nov 2025 18:29:57 GMT
- Title: Simulating Environments with Reasoning Models for Agent Training
- Authors: Yuetai Li, Huseyin A Inan, Xiang Yue, Wei-Ning Chen, Lukas Wutschitz, Janardhan Kulkarni, Radha Poovendran, Robert Sim, Saravan Rajmohan,
- Abstract summary: Building bespoke environments for training is heavy, brittle, and limits progress.<n>We propose two frameworks: Simia-SFT and Simia-RL.<n>Simia-SFT and Simia-RL enable scalable agent training without environment engineering.
- Score: 55.98861707136674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LLM agents excel in compact environments requiring deep reasoning but remain brittle when operating in broader, more complex contexts that demand robustness across diverse tools and schemas. Building bespoke environments for training is heavy, brittle, and limits progress. In this paper, we demonstrate that LLMs can simulate realistic environment feedback without access to actual testbed data or APIs. Inspired by this capability, we propose two frameworks: Simia-SFT, a pipeline that synthesizes SFT data by amplifying small seed sets into diverse trajectories in an environment-agnostic manner, and Simia-RL, a framework that enables RL training without real environment implementations through LLM-simulated feedback. Fine-tuning open models yields consistent improvements across multiple benchmarks, surpassing GPT-4o and approaching o4-mini on $\tau^2$-Bench. Together, Simia-SFT and Simia-RL enable scalable agent training without environment engineering, replacing heavy and brittle implementations with flexible LLM-based simulation.
Related papers
- Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning [62.499592503950026]
Large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments.<n>We propose Agent World Model (AWM), a fully synthetic environment generation pipeline.<n>We scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets.
arXiv Detail & Related papers (2026-02-10T18:55:41Z) - SimLLM: Fine-Tuning Code LLMs for SimPy-Based Queueing System Simulation [19.709345153687142]
Python package SimPy is widely used for modeling queueing systems.<n>Recent advances in large language models (LLMs) have shown strong ability in generating clear and executable code.<n>We fine-tune two open-source LLMs, Qwen-Coder-7B and DeepSeek-Coder-6.7B, on curated SimPy queueing data.
arXiv Detail & Related papers (2026-01-10T11:53:39Z) - SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning [3.1436750864792375]
We introduce SimuAgent, an LLM-powered modeling and simulation agent tailored for Simulink.<n>SimuAgent replaces XML with a concise, dictionary-style Python representation, dramatically cutting token counts.<n>A lightweight plan-execute architecture, trained in two stages, equips the agent with both low-level tool skills and high-level design reasoning.
arXiv Detail & Related papers (2026-01-08T18:10:35Z) - GEM: A Gym for Agentic LLMs [88.36970707762424]
General Experience Maker ( GEM) is an open-source environment simulator designed for the age of large language models (LLMs)<n> GEM provides a standardized framework for the environment-agent interface, including asynchronous vectorized execution for high throughput.<n>We conduct apple-to-apple benchmarking of PPO, GRPO and REINFORCE in both single- and multi-turn settings using GEM to shed light on the algorithmic designs.
arXiv Detail & Related papers (2025-10-01T15:55:57Z) - RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use [50.52940111891476]
Large language models excel at basic reasoning but struggle with tasks that require interaction with external tools.<n>We present RLFactory, a plug-and-play reinforcement learning framework for multi-round tool use.
arXiv Detail & Related papers (2025-08-31T16:47:31Z) - ChronoLLM: Customizing Language Models for Physics-Based Simulation Code Generation [8.554484252096913]
We present a framework for refining and customizing open- and closed-source large language models (LLMs)<n>We harness the power of AI in generating scripts that perform PyChrono virtual experiments.
arXiv Detail & Related papers (2025-08-19T16:12:51Z) - G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration [48.948187359727996]
G-Sim is a hybrid framework that automates simulator construction with rigorous empirical calibration.<n>It produces reliable, causally-informed simulators, mitigating data-inefficiency and enabling robust system-level interventions.
arXiv Detail & Related papers (2025-06-10T22:14:34Z) - MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z) - LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots [20.715834172041763]
We propose LoopSR, a lifelong policy adaptation framework that continuously refines RL policies in the post-deployment stage.<n>LoopSR employs a transformer-based encoder to map real-world trajectories into a latent space.<n>Autoencoder architecture and contrastive learning methods are adopted to enhance feature extraction of real-world dynamics.
arXiv Detail & Related papers (2024-09-26T16:02:25Z) - LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale [17.00936774784349]
There is a lack of simulation infrastructure capable of accurately modeling versatile hardware-software behaviors in large language model (LLM) serving systems.
This paper aims to develop an effective simulation tool, called LLMServingSim, to support future research in LLM serving systems.
arXiv Detail & Related papers (2024-08-10T09:26:15Z) - A Conservative Approach for Few-Shot Transfer in Off-Dynamics Reinforcement Learning [3.1515473193934778]
Off-dynamics Reinforcement Learning seeks to transfer a policy from a source environment to a target environment characterized by distinct yet similar dynamics.
We propose an innovative approach inspired by recent advancements in Imitation Learning and conservative RL algorithms.
arXiv Detail & Related papers (2023-12-24T13:09:08Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.