Thinker: Learning to Plan and Act
- URL: http://arxiv.org/abs/2307.14993v2
- Date: Thu, 26 Oct 2023 23:11:37 GMT
- Title: Thinker: Learning to Plan and Act
- Authors: Stephen Chung, Ivan Anokhin, David Krueger
- Abstract summary: The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model.
We demonstrate the algorithm's effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark.
- Score: 18.425843346728648
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose the Thinker algorithm, a novel approach that enables reinforcement
learning agents to autonomously interact with and utilize a learned world
model. The Thinker algorithm wraps the environment with a world model and
introduces new actions designed for interacting with the world model. These
model-interaction actions enable agents to perform planning by proposing
alternative plans to the world model before selecting a final action to execute
in the environment. This approach eliminates the need for handcrafted planning
algorithms by enabling the agent to learn how to plan autonomously and allows
for easy interpretation of the agent's plan with visualization. We demonstrate
the algorithm's effectiveness through experimental results in the game of
Sokoban and the Atari 2600 benchmark, where the Thinker algorithm achieves
state-of-the-art performance and competitive results, respectively.
Visualizations of agents trained with the Thinker algorithm demonstrate that
they have learned to plan effectively with the world model to select better
actions. Thinker is the first work showing that an RL agent can learn to plan
with a learned world model in complex environments.
Related papers
- Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment [11.81398773711566]
We give a model-based agent that builds a Python program representing its knowledge of the world based on its interactions with the environment.
We study our agent on gridworlds, and on task planning, finding our approach is more sample-efficient compared to deep RL, more compute-efficient compared to ReAct-style agents, and that it can transfer its knowledge across environments by editing its code.
arXiv Detail & Related papers (2024-02-19T16:39:18Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation [107.5934592892763]
We propose DREAMWALKER -- a world model based VLN-CE agent.
The world model is built to summarize the visual, topological, and dynamic properties of the complicated continuous environment.
It can simulate and evaluate possible plans entirely in such internal abstract world, before executing costly actions.
arXiv Detail & Related papers (2023-08-14T23:45:01Z) - An intelligent tutor for planning in large partially observable environments [0.8739101659113157]
We develop and evaluate the first intelligent tutor for planning in partially observable environments.
Compared to previous intelligent tutors for teaching planning strategies, this novel intelligent tutor combines two innovations.
A preregistered experiment with 330 participants demonstrated that the new intelligent tutor is highly effective at improving people's ability to make good decisions in partially observable environments.
arXiv Detail & Related papers (2023-02-06T13:57:08Z) - Affordance Learning from Play for Sample-Efficient Policy Learning [30.701546777177555]
We use a self-supervised visual affordance model from human teleoperated play data to enable efficient policy learning and motion planning.
We combine model-based planning with model-free deep reinforcement learning to learn policies that favor the same object regions favored by people.
We find that our policies train 4x faster than the baselines and generalize better to novel objects because our visual affordance model can anticipate their affordance regions.
arXiv Detail & Related papers (2022-03-01T11:00:35Z) - Procedure Planning in Instructional Videosvia Contextual Modeling and
Model-based Policy Learning [114.1830997893756]
This work focuses on learning a model to plan goal-directed actions in real-life videos.
We propose novel algorithms to model human behaviors through Bayesian Inference and model-based Imitation Learning.
arXiv Detail & Related papers (2021-10-05T01:06:53Z) - Planning from video game descriptions [0.0]
Planners use these action models to get the deliberative behaviour for an agent in many different video games.
benchmarks of the domains have been produced that can be of interest to the international planning community.
arXiv Detail & Related papers (2021-09-01T15:49:09Z) - Human-Level Reinforcement Learning through Theory-Based Modeling,
Exploration, and Planning [27.593497502386143]
Theory-Based Reinforcement Learning uses human-like intuitive theories to explore and model an environment.
We instantiate the approach in a video game playing agent called EMPA.
EMPA matches human learning efficiency on a suite of 90 Atari-style video games.
arXiv Detail & Related papers (2021-07-27T01:38:13Z) - A Consciousness-Inspired Planning Agent for Model-Based Reinforcement
Learning [104.3643447579578]
We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state.
The design allows agents to learn to plan effectively, by attending to the relevant objects, leading to better out-of-distribution generalization.
arXiv Detail & Related papers (2021-06-03T19:35:19Z) - Bridging Imagination and Reality for Model-Based Deep Reinforcement
Learning [72.18725551199842]
We propose a novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD)
It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories.
We demonstrate that our approach improves sample efficiency of model-based planning, and achieves state-of-the-art performance on challenging visual control benchmarks.
arXiv Detail & Related papers (2020-10-23T03:22:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.