Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning
- URL: http://arxiv.org/abs/2406.19561v1
- Date: Thu, 27 Jun 2024 22:24:46 GMT
- Title: Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning
- Authors: Bradley Burega, John D. Martin, Luke Kapeluck, Michael Bowling,
- Abstract summary: This paper introduces an online, meta-gradient algorithm that tunes a probability with which states are queried during Dyna-style planning.
Results indicate that our method improves efficiency of the planning process.
- Score: 8.552540426753
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study how a Reinforcement Learning (RL) system can remain sample-efficient when learning from an imperfect model of the environment. This is particularly challenging when the learning system is resource-constrained and in continual settings, where the environment dynamics change. To address these challenges, our paper introduces an online, meta-gradient algorithm that tunes a probability with which states are queried during Dyna-style planning. Our study compares the aggregate, empirical performance of this meta-gradient method to baselines that employ conventional sampling strategies. Results indicate that our method improves efficiency of the planning process, which, as a consequence, improves the sample-efficiency of the overall learning process. On the whole, we observe that our meta-learned solutions avoid several pathologies of conventional planning approaches, such as sampling inaccurate transitions and those that stall credit assignment. We believe these findings could prove useful, in future work, for designing model-based RL systems at scale.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.