Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback
- URL: http://arxiv.org/abs/2512.22336v1
- Date: Fri, 26 Dec 2025 18:54:14 GMT
- Title: Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback
- Authors: Mengkang Hu, Bowei Xia, Yuran Wu, Ailing Yu, Yude Zou, Qiguang Chen, Shijian Wang, Jiarui Jin, Kexin Li, Wenxiang Jiao, Yuan Lu, Ping Luo,
- Abstract summary: Agent2World is a tool-augmented multi-agent framework that achieves strong inference-time world-model generation.<n>It also serves as a data engine for supervised fine-tuning, by grounding generation in multi-agent feedback.
- Score: 51.22403664895878
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Symbolic world models (e.g., PDDL domains or executable simulators) are central to model-based planning, but training LLMs to generate such world models is limited by the lack of large-scale verifiable supervision. Current approaches rely primarily on static validation methods that fail to catch behavior-level errors arising from interactive execution. In this paper, we propose Agent2World, a tool-augmented multi-agent framework that achieves strong inference-time world-model generation and also serves as a data engine for supervised fine-tuning, by grounding generation in multi-agent feedback. Agent2World follows a three-stage pipeline: (i) A Deep Researcher agent performs knowledge synthesis by web searching to address specification gaps; (ii) A Model Developer agent implements executable world models; And (iii) a specialized Testing Team conducts adaptive unit testing and simulation-based validation. Agent2World demonstrates superior inference-time performance across three benchmarks spanning both Planning Domain Definition Language (PDDL) and executable code representations, achieving consistent state-of-the-art results. Beyond inference, Testing Team serves as an interactive environment for the Model Developer, providing behavior-aware adaptive feedback that yields multi-turn training trajectories. The model fine-tuned on these trajectories substantially improves world-model generation, yielding an average relative gain of 30.95% over the same model before training. Project page: https://agent2world.github.io.
Related papers
- MagicAgent: Towards Generalized Agent Planning [73.21129030631421]
We present textbfMagicAgent, a series of foundation models specifically designed for generalized agent planning.<n>We introduce a lightweight and scalable synthetic data framework that generates high-quality trajectories across diverse planning tasks.<n>We show that MagicAgent-32B and MagicAgent-30B-A3B achieve superior performance across diverse open-source benchmarks.
arXiv Detail & Related papers (2026-02-22T01:39:16Z) - Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning [62.499592503950026]
Large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments.<n>We propose Agent World Model (AWM), a fully synthetic environment generation pipeline.<n>We scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets.
arXiv Detail & Related papers (2026-02-10T18:55:41Z) - From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents [23.583947864141162]
EigenData is a hierarchical multi-agent engine that synthesizes tool-grounded dialogues together with executable per-instance checkers.<n>Building on the synthetic data, we develop an RL recipe that first fine-tunes the user model and then applies GRPO-style training.<n>Our results suggest a scalable pathway for bootstrapping complex tool-using behaviors without expensive human annotation.
arXiv Detail & Related papers (2026-01-30T06:01:23Z) - GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning [54.42973725693]
We introduce GenAgent, unifying visual understanding and generation through an agentic multimodal model.<n>GenAgent significantly boosts base generator(FLUX.1-dev) performance on GenEval++ and WISE.<n>Our framework demonstrates three key properties: 1) cross-tool generalization to generators with varying capabilities, 2) test-time scaling with consistent improvements across interaction rounds, and 3) task-adaptive reasoning that automatically adjusts to different tasks.
arXiv Detail & Related papers (2026-01-26T14:49:04Z) - VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents [130.70999337445468]
Key challenge in training Vision-Language Model (VLM) agents, compared to Language Model (LLM) agents, is shift from textual states to complex visual observations.<n>We ask: Can VLM agents construct internal world models through explicit visual state reasoning?<n>We architecturally enforce and reward the agent's reasoning process via reinforcement learning (RL)<n>We find that the agent's reasoning into State Estimation and Transition Modeling is critical for success.
arXiv Detail & Related papers (2025-10-19T16:05:07Z) - World Model Implanting for Test-time Adaptation of Embodied Agents [29.514831254621438]
In embodied AI, a persistent challenge is enabling agents to robustly adapt to novel domains without requiring extensive data collection or retraining.<n>We present a world model implanting framework (WorMI) that combines the reasoning capabilities of large language models with independently learned, domain-specific world models.<n>We evaluate our WorMI on the VirtualHome and ALFWorld benchmarks, demonstrating superior zero-shot and few-shot performance compared to several LLM-based approaches.
arXiv Detail & Related papers (2025-09-04T07:32:16Z) - Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning [2.3964255330849356]
We present the Multi-Agent Transformer World Model (MATWM), a novel transformer-based world model for reinforcement learning.<n>MATWM combines a decentralized imagination framework with a semi-centralized critic and a teammate prediction module.<n>We evaluate MATWM on a broad suite of benchmarks, including the StarCraft Multi-Agent Challenge, PettingZoo, and MeltingPot.
arXiv Detail & Related papers (2025-06-23T11:47:17Z) - WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model [55.276852838877346]
Self-evolving agents are trained on trajectories sampled autonomously based on their own policies.<n>We propose a novel framework that introduces a co-evolving World Model LLM.<n>This world model predicts the next observation based on the current observation and action within the web environment.
arXiv Detail & Related papers (2025-04-23T02:54:31Z) - APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay [86.01901238059261]
APIGen-MT is a framework that generates verifiable and diverse multi-turn agent data.<n>We train a family of models -- the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters.<n>Our models outperform frontier models such as GPT-4o and Claude 3.5 on $tau$-bench and BFCL benchmarks.
arXiv Detail & Related papers (2025-04-04T17:13:57Z) - Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark [72.46357004059661]
Generalist Virtual Agents (GVAs) have shown significant promise in autonomous task execution.<n>To address these challenges, we propose Similar, a Step-Wise Multi-Dimensional Generalist Reward Model.<n>Similar offers fine-grained signals for agent training and can choose better action for inference-time scaling.
arXiv Detail & Related papers (2025-03-24T13:30:47Z) - TrajAgent: An LLM-Agent Framework for Trajectory Modeling via Large-and-Small Model Collaboration [10.000248410171269]
Trajectory modeling has widespread applications in areas such as life services, urban transportation, and public administration.<n>In this paper, we propose TrajAgent, an agent framework powered by large language models, to facilitate robust and efficient trajectory modeling.<n>In experiments on five tasks using four real-world datasets, TrajAgent achieved a performance improvement of 2.38%-69.91% over baseline methods.
arXiv Detail & Related papers (2024-10-27T13:51:09Z) - Automating Traffic Model Enhancement with AI Research Agent [4.420199777075044]
TR-Agent is an AI-powered framework that autonomously develops and refines traffic models.<n>We structure the research pipeline into four key stages: idea generation, theory formulation, theory evaluation, and iterative optimization.<n>Through iteratively feedback and refinement, TR-Agent improves both modeling efficiency and effectiveness.
arXiv Detail & Related papers (2024-09-25T12:42:25Z) - Multiscale Generative Models: Improving Performance of a Generative
Model Using Feedback from Other Dependent Generative Models [10.053377705165786]
We take a first step towards building interacting generative models (GANs) that reflects the interaction in real world.
We build and analyze a hierarchical set-up where a higher-level GAN is conditioned on the output of multiple lower-level GANs.
We present a technique of using feedback from the higher-level GAN to improve performance of lower-level GANs.
arXiv Detail & Related papers (2022-01-24T13:05:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.