OptionZero: Planning with Learned Options
- URL: http://arxiv.org/abs/2502.16634v3
- Date: Fri, 21 Mar 2025 13:30:42 GMT
- Title: OptionZero: Planning with Learned Options
- Authors: Po-Wei Huang, Pei-Chiun Peng, Hung Guei, Ti-Rong Wu,
- Abstract summary: Planning with options has been shown effective in reinforcement learning within complex environments.<n>Inspired by MuZero, we propose a novel approach, named OptionZero.<n> OptionZero incorporates an option network into MuZero, providing autonomous discovery of options through self-play games.
- Score: 6.929921943833662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Planning with options -- a sequence of primitive actions -- has been shown effective in reinforcement learning within complex environments. Previous studies have focused on planning with predefined options or learned options through expert demonstration data. Inspired by MuZero, which learns superhuman heuristics without any human knowledge, we propose a novel approach, named OptionZero. OptionZero incorporates an option network into MuZero, providing autonomous discovery of options through self-play games. Furthermore, we modify the dynamics network to provide environment transitions when using options, allowing searching deeper under the same simulation constraints. Empirical experiments conducted in 26 Atari games demonstrate that OptionZero outperforms MuZero, achieving a 131.58% improvement in mean human-normalized score. Our behavior analysis shows that OptionZero not only learns options but also acquires strategic skills tailored to different game characteristics. Our findings show promising directions for discovering and using options in planning. Our code is available at https://rlg.iis.sinica.edu.tw/papers/optionzero.
Related papers
- Interpreting the Learned Model in MuZero Planning [12.47846647115319]
MuZero has achieved superhuman performance in various games by using a dynamics network to predict environment dynamics for planning.
This paper aims to demystify MuZero's model by interpreting the learned latent states.
arXiv Detail & Related papers (2024-11-07T10:06:23Z) - UniZero: Generalized and Efficient Planning with Scalable Latent World Models [29.648382211926364]
UniZero is a novel approach that employs a modular transformer-based world model to effectively learn a shared latent space.
We show that UniZero significantly outperforms existing baselines in benchmarks that require long-term memory.
In standard single-task RL settings, such as Atari and DMControl, UniZero matches or even surpasses the performance of current state-of-the-art methods.
arXiv Detail & Related papers (2024-06-15T15:24:15Z) - MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games [9.339645051415115]
MiniZero is a zero-knowledge learning framework that supports four state-of-the-art algorithms.
We evaluate the performance of each algorithm in two board games, 9x9 Go and 8x8 Othello, as well as 57 Atari games.
arXiv Detail & Related papers (2023-10-17T14:29:25Z) - Equivariant MuZero [14.027651496499882]
We propose improving the data efficiency and generalisation capabilities of MuZero by explicitly incorporating the symmetries of the environment in its world-model architecture.
We prove that, so long as the neural networks used by MuZero are equivariant to a particular symmetry group acting on the environment, the entirety of MuZero's action-selection algorithm will also be equivariant to that group.
arXiv Detail & Related papers (2023-02-09T17:46:29Z) - Efficient Offline Policy Optimization with a Learned Model [83.64779942889916]
MuZero Unplugged presents a promising approach for offline policy learning from logged data.
It conducts Monte-Carlo Tree Search (MCTS) with a learned model and leverages Reanalyze algorithm to learn purely from offline data.
This paper investigates a few hypotheses where MuZero Unplugged may not work well under the offline settings.
arXiv Detail & Related papers (2022-10-12T07:41:04Z) - Reward-Respecting Subtasks for Model-Based Reinforcement Learning [13.906158484935098]
Reinforcement learning must include planning with a model of the world that is abstract in state and time.
One reason for this is that the space of possible options is immense, and the methods previously proposed for option discovery do not take into account how the option models will be used in planning.
We show that option models obtained from reward-respecting subtasks are much more likely to be useful in planning than eigenoptions, shortest path options based on bottleneck states, or reward-respecting options generated by the option-critic.
arXiv Detail & Related papers (2022-02-07T19:09:27Z) - Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework.
We show that this leads to behaviorally diverse options which are also capable of state abstraction.
We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z) - Flexible Option Learning [69.78645585943592]
We revisit and extend intra-option learning in the context of deep reinforcement learning.
We obtain significant improvements in performance and data-efficiency across a wide variety of domains.
arXiv Detail & Related papers (2021-12-06T15:07:48Z) - Temporal Abstraction in Reinforcement Learning with the Successor
Representation [65.69658154078007]
We argue that the successor representation (SR) can be seen as a natural substrate for the discovery and use of temporal abstractions.
We show how the SR can be used to discover options that facilitate either temporally-extended exploration or planning.
arXiv Detail & Related papers (2021-10-12T05:07:43Z) - Combining Off and On-Policy Training in Model-Based Reinforcement
Learning [77.34726150561087]
We propose a way to obtain off-policy targets using data from simulated games in MuZero.
Our results show that these targets speed up the training process and lead to faster convergence and higher rewards.
arXiv Detail & Related papers (2021-02-24T10:47:26Z) - Diversity-Enriched Option-Critic [47.82697599507171]
We show that our proposed method is capable of learning options end-to-end on several discrete and continuous control tasks.
Our approach generates robust, reusable, reliable and interpretable options, in contrast to option-critic.
arXiv Detail & Related papers (2020-11-04T22:12:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.