Thinker: Learning to Plan and Act
- URL: http://arxiv.org/abs/2307.14993v2
- Date: Thu, 26 Oct 2023 23:11:37 GMT
- Title: Thinker: Learning to Plan and Act
- Authors: Stephen Chung, Ivan Anokhin, David Krueger
- Abstract summary: The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model.
We demonstrate the algorithm's effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark.
- Score: 18.425843346728648
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose the Thinker algorithm, a novel approach that enables reinforcement
learning agents to autonomously interact with and utilize a learned world
model. The Thinker algorithm wraps the environment with a world model and
introduces new actions designed for interacting with the world model. These
model-interaction actions enable agents to perform planning by proposing
alternative plans to the world model before selecting a final action to execute
in the environment. This approach eliminates the need for handcrafted planning
algorithms by enabling the agent to learn how to plan autonomously and allows
for easy interpretation of the agent's plan with visualization. We demonstrate
the algorithm's effectiveness through experimental results in the game of
Sokoban and the Atari 2600 benchmark, where the Thinker algorithm achieves
state-of-the-art performance and competitive results, respectively.
Visualizations of agents trained with the Thinker algorithm demonstrate that
they have learned to plan effectively with the world model to select better
actions. Thinker is the first work showing that an RL agent can learn to plan
with a learned world model in complex environments.
Related papers
- WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning [52.36434784963598]
We introduce WorldPrediction, a video-based benchmark for evaluating world modeling and procedural planning capabilities of different AI models.<n>We show that current frontier models barely achieve 57% accuracy on WorldPrediction-WM and 38% on WorldPrediction-PP whereas humans are able to solve both tasks perfectly.
arXiv Detail & Related papers (2025-06-04T18:22:40Z) - World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks [55.90051810762702]
We present a comprehensive overview of world models, highlighting their architecture, training paradigms, and applications across prediction, generation, planning, and causal reasoning.<n>We propose Wireless Dreamer, a novel world model-based reinforcement learning framework tailored for wireless edge intelligence optimization.
arXiv Detail & Related papers (2025-05-31T06:43:00Z) - Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents [76.86311820866153]
We propose Dyna-Think, a thinking framework that integrates planning with an internal world model with reasoning and acting to enhance AI agent performance.<n>DIT reconstructs the thinking process of R1 to focus on performing world model simulation relevant to the proposed (and planned) action, and trains the policy using this reconstructed data.<n>DDT uses a two-stage training process to first improve the agent's world modeling ability via objectives such as state prediction or critique generation, and then improve the agent's action via policy training.
arXiv Detail & Related papers (2025-05-31T00:10:18Z) - AdaWorld: Learning Adaptable World Models with Latent Actions [76.50869178593733]
We propose AdaWorld, an innovative world model learning approach that enables efficient adaptation.
Key idea is to incorporate action information during the pretraining of world models.
We then develop an autoregressive world model that conditions on these latent actions.
arXiv Detail & Related papers (2025-03-24T17:58:15Z) - Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment [11.81398773711566]
We give a model-based agent that builds a Python program representing its knowledge of the world based on its interactions with the environment.
We study our agent on gridworlds, and on task planning, finding our approach is more sample-efficient compared to deep RL, more compute-efficient compared to ReAct-style agents, and that it can transfer its knowledge across environments by editing its code.
arXiv Detail & Related papers (2024-02-19T16:39:18Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation [107.5934592892763]
We propose DREAMWALKER -- a world model based VLN-CE agent.
The world model is built to summarize the visual, topological, and dynamic properties of the complicated continuous environment.
It can simulate and evaluate possible plans entirely in such internal abstract world, before executing costly actions.
arXiv Detail & Related papers (2023-08-14T23:45:01Z) - An intelligent tutor for planning in large partially observable environments [0.8739101659113157]
We develop and evaluate the first intelligent tutor for planning in partially observable environments.
Compared to previous intelligent tutors for teaching planning strategies, this novel intelligent tutor combines two innovations.
A preregistered experiment with 330 participants demonstrated that the new intelligent tutor is highly effective at improving people's ability to make good decisions in partially observable environments.
arXiv Detail & Related papers (2023-02-06T13:57:08Z) - Affordance Learning from Play for Sample-Efficient Policy Learning [30.701546777177555]
We use a self-supervised visual affordance model from human teleoperated play data to enable efficient policy learning and motion planning.
We combine model-based planning with model-free deep reinforcement learning to learn policies that favor the same object regions favored by people.
We find that our policies train 4x faster than the baselines and generalize better to novel objects because our visual affordance model can anticipate their affordance regions.
arXiv Detail & Related papers (2022-03-01T11:00:35Z) - Procedure Planning in Instructional Videosvia Contextual Modeling and
Model-based Policy Learning [114.1830997893756]
This work focuses on learning a model to plan goal-directed actions in real-life videos.
We propose novel algorithms to model human behaviors through Bayesian Inference and model-based Imitation Learning.
arXiv Detail & Related papers (2021-10-05T01:06:53Z) - Planning from video game descriptions [0.0]
Planners use these action models to get the deliberative behaviour for an agent in many different video games.
benchmarks of the domains have been produced that can be of interest to the international planning community.
arXiv Detail & Related papers (2021-09-01T15:49:09Z) - Human-Level Reinforcement Learning through Theory-Based Modeling,
Exploration, and Planning [27.593497502386143]
Theory-Based Reinforcement Learning uses human-like intuitive theories to explore and model an environment.
We instantiate the approach in a video game playing agent called EMPA.
EMPA matches human learning efficiency on a suite of 90 Atari-style video games.
arXiv Detail & Related papers (2021-07-27T01:38:13Z) - A Consciousness-Inspired Planning Agent for Model-Based Reinforcement
Learning [104.3643447579578]
We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state.
The design allows agents to learn to plan effectively, by attending to the relevant objects, leading to better out-of-distribution generalization.
arXiv Detail & Related papers (2021-06-03T19:35:19Z) - Bridging Imagination and Reality for Model-Based Deep Reinforcement
Learning [72.18725551199842]
We propose a novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD)
It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories.
We demonstrate that our approach improves sample efficiency of model-based planning, and achieves state-of-the-art performance on challenging visual control benchmarks.
arXiv Detail & Related papers (2020-10-23T03:22:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.