Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents
- URL: http://arxiv.org/abs/2506.00320v1
- Date: Sat, 31 May 2025 00:10:18 GMT
- Title: Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents
- Authors: Xiao Yu, Baolin Peng, Ruize Xu, Michel Galley, Hao Cheng, Suman Nath, Jianfeng Gao, Zhou Yu,
- Abstract summary: We propose Dyna-Think, a thinking framework that integrates planning with an internal world model with reasoning and acting to enhance AI agent performance.<n>DIT reconstructs the thinking process of R1 to focus on performing world model simulation relevant to the proposed (and planned) action, and trains the policy using this reconstructed data.<n>DDT uses a two-stage training process to first improve the agent's world modeling ability via objectives such as state prediction or critique generation, and then improve the agent's action via policy training.
- Score: 76.86311820866153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress in reasoning with large language models (LLMs), such as DeepSeek-R1, demonstrates impressive capabilities in domains like mathematics and coding, by exhibiting complex cognitive behaviors such as verification, goal decomposition, and self-reflection. However, it is unclear what behavior is effective and what behavior is missing for long-horizon AI agents tasks. In this work, we propose Dyna-Think, a thinking framework that integrates planning with an internal world model with reasoning and acting to enhance AI agent performance. To enable Dyna-Think, we propose Dyna-Think Imitation Learning (DIT) and Dyna-Think Dyna Training (DDT). To initialize a policy with Dyna-Think, DIT reconstructs the thinking process of R1 to focus on performing world model simulation relevant to the proposed (and planned) action, and trains the policy using this reconstructed data. To enhance Dyna-Think, DDT uses a two-stage training process to first improve the agent's world modeling ability via objectives such as state prediction or critique generation, and then improve the agent's action via policy training. We evaluate our methods on OSWorld, and demonstrate that Dyna-Think improves the agent's in-domain and out-of-domain performance, achieving similar best-of-n performance compared to R1 while generating 2x less tokens on average. Our extensive empirical studies reveal that 1) using critique generation for world model training is effective to improve policy performance; and 2) AI agents with better performance correlate with better world modeling abilities. We believe our results suggest a promising research direction to integrate world model simulation into AI agents to enhance their reasoning, planning, and acting capabilities.
Related papers
- World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks [55.90051810762702]
We present a comprehensive overview of world models, highlighting their architecture, training paradigms, and applications across prediction, generation, planning, and causal reasoning.<n>We propose Wireless Dreamer, a novel world model-based reinforcement learning framework tailored for wireless edge intelligence optimization.
arXiv Detail & Related papers (2025-05-31T06:43:00Z) - WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model [55.276852838877346]
Self-evolving agents are trained on trajectories sampled autonomously based on their own policies.<n>We propose a novel framework that introduces a co-evolving World Model LLM.<n>This world model predicts the next observation based on the current observation and action within the web environment.
arXiv Detail & Related papers (2025-04-23T02:54:31Z) - On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks.
We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly.
In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z) - Mastering the Digital Art of War: Developing Intelligent Combat Simulation Agents for Wargaming Using Hierarchical Reinforcement Learning [0.0]
dissertation proposes a comprehensive approach, including targeted observation abstractions, multi-model integration, a hybrid AI framework, and an overarching hierarchical reinforcement learning framework.
Our localized observation abstraction using piecewise linear spatial decay simplifies the RL problem, enhancing computational efficiency and demonstrating superior efficacy over traditional global observation methods.
Our hybrid AI framework synergizes RL with scripted agents, leveraging RL for high-level decisions and scripted agents for lower-level tasks, enhancing adaptability, reliability, and performance.
arXiv Detail & Related papers (2024-08-23T18:50:57Z) - Ego-Foresight: Self-supervised Learning of Agent-Aware Representations for Improved RL [35.19457580182083]
We present Ego-Foresight, a self-supervised method for disentangling agent and environment based on motion and prediction.<n>Our main finding is self-supervised agent-awareness by visuomotor prediction of the agent improves sample-efficiency and performance of the underlying RL algorithm.
arXiv Detail & Related papers (2024-05-27T13:32:43Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - A Platform-Agnostic Deep Reinforcement Learning Framework for Effective Sim2Real Transfer towards Autonomous Driving [0.0]
Deep Reinforcement Learning (DRL) has shown remarkable success in solving complex tasks.
transferring DRL agents to the real world is still challenging due to the significant discrepancies between simulation and reality.
We propose a robust DRL framework that leverages platform-dependent perception modules to extract task-relevant information.
arXiv Detail & Related papers (2023-04-14T07:55:07Z) - On Realization of Intelligent Decision-Making in the Real World: A
Foundation Decision Model Perspective [54.38373782121503]
A Foundation Decision Model (FDM) can be developed by formulating diverse decision-making tasks as sequence decoding tasks.
We present a case study demonstrating our FDM implementation, DigitalBrain (DB1) with 1.3 billion parameters, achieving human-level performance in 870 tasks.
arXiv Detail & Related papers (2022-12-24T06:16:45Z) - Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement
Learning [0.8399688944263843]
Deep reinforcement learning (DRL) is a promising approach to solve complex control tasks by learning policies through interactions with the environment.
Sim-to-real approaches leverage simulations to pretrain DRL policies and then deploy them in the real world.
This work proposes a distributed cloud-edge architecture to train DRL agents in the real world in real-time.
arXiv Detail & Related papers (2022-03-04T10:27:01Z) - Dyna-T: Dyna-Q and Upper Confidence Bounds Applied to Trees [0.9137554315375919]
We present a preliminary investigation of a novel algorithm called Dyna-T.
In reinforcement learning (RL) a planning agent has its own representation of the environment as a model.
Experience can be used for learning a better model or improve directly the value function and policy.
arXiv Detail & Related papers (2022-01-12T15:06:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.