SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model
- URL: http://arxiv.org/abs/2507.23773v1
- Date: Thu, 31 Jul 2025 17:57:20 GMT
- Title: SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model
- Authors: Mingkai Deng, Jinyu Hou, Yilin Shen, Hongxia Jin, Graham Neubig, Zhiting Hu, Eric Xing,
- Abstract summary: We introduce SimuRA, a goal-oriented architecture for generalized agentic reasoning.<n>modelname overcomes the limitations of autoregressive reasoning by introducing a world model for planning via simulation.<n>World-model-based planning, in particular, shows consistent advantage of up to 124% over autoregressive planning.
- Score: 88.04128601981145
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: AI agents built on large language models (LLMs) hold enormous promise, but current practice focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also suffers from the fundamental limitations of autoregressive LLMs. On the other hand, humans are general agents who reason by mentally simulating the outcomes of their actions and plans. Moving towards a more general and powerful AI agent, we introduce SimuRA, a goal-oriented architecture for generalized agentic reasoning. Based on a principled formulation of optimal agent in any environment, \modelname overcomes the limitations of autoregressive reasoning by introducing a world model for planning via simulation. The generalized world model is implemented using LLM, which can flexibly plan in a wide range of environments using the concept-rich latent space of natural language. Experiments on difficult web browsing tasks show that \modelname improves the success of flight search from 0\% to 32.2\%. World-model-based planning, in particular, shows consistent advantage of up to 124\% over autoregressive planning, demonstrating the advantage of world model simulation as a reasoning paradigm. We are excited about the possibility for training a single, general agent model based on LLMs that can act superintelligently in all environments. To start, we make SimuRA, a web-browsing agent built on \modelname with pretrained LLMs, available as a research demo for public testing.
Related papers
- AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability [84.52205243353761]
Recent work proposes using world models to generate controlled virtual environments in which AI agents can be tested before deployment.<n>We investigate ways of simplifying world models that remain agnostic to the AI agent under evaluation.
arXiv Detail & Related papers (2025-04-06T20:35:44Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents [22.608219492706876]
We propose a model-based planning framework for web agents that employs a world model to simulate and deliberate over the outcome of each candidate action before committing to one.<n> Empirical results demonstrate that WebDreamer achieves substantial performance improvements over reactive baselines.<n>Our trained world model, Dreamer-7B, performs comparable to GPT-4o, highlighting the potential of specialized world models for efficient and effective planning in complex web environments.
arXiv Detail & Related papers (2024-11-10T18:50:51Z) - Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation [25.26545170310844]
We present a World-model-augmented (WMA) web agent, which simulates the outcomes of its actions for better decision-making.<n>Experiments on WebArena and Mind2Web show that our world models improve agents' policy selection without training.
arXiv Detail & Related papers (2024-10-17T05:37:00Z) - ReasonPlanner: Enhancing Autonomous Planning in Dynamic Environments with Temporal Knowledge Graphs and LLMs [0.32141666878560626]
We introduce ReasonPlanner, a novel generalist agent designed for reflective thinking, planning, and interactive reasoning.
ReasonPlanner significantly outperforms previous state-of-the-art prompting-based methods on the ScienceWorld benchmark by more than 1.8 times.
It relies solely on frozen weights thus requiring no gradient updates.
arXiv Detail & Related papers (2024-10-11T20:58:51Z) - On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks.
We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly.
In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z) - GenSim: A General Social Simulation Platform with Large Language Model based Agents [111.00666003559324]
We propose a novel large language model (LLMs)-based simulation platform called textitGenSim.<n>Our platform supports one hundred thousand agents to better simulate large-scale populations in real-world contexts.<n>To our knowledge, GenSim represents an initial step toward a general, large-scale, and correctable social simulation platform.
arXiv Detail & Related papers (2024-10-06T05:02:23Z) - Sparse Rewards Can Self-Train Dialogue Agents [22.799506097310008]
We introduce a novel self-improvement paradigm that empowers LLM agents to autonomously enhance their performance without external human feedback.<n>We present ToolWOZ, a sparse reward tool-calling simulation environment derived from MultiWOZ.<n>We demonstrate that models trained with JOSH, both small and frontier, significantly improve tool-based interactions while preserving general model capabilities across diverse benchmarks.
arXiv Detail & Related papers (2024-09-06T21:00:57Z) - WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence.<n>WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.