SPO: Sequential Monte Carlo Policy Optimisation
- URL: http://arxiv.org/abs/2402.07963v3
- Date: Thu, 31 Oct 2024 17:05:49 GMT
- Title: SPO: Sequential Monte Carlo Policy Optimisation
- Authors: Matthew V Macfarlane, Edan Toledo, Donal Byrne, Paul Duckworth, Alexandre Laterre,
- Abstract summary: We introduce SPO: Sequential Monte Carlo Policy optimisation.
We show that SPO provides robust policy improvement and efficient scaling properties.
We demonstrate statistically significant improvements in performance relative to model-free and model-based baselines.
- Score: 41.52684912140086
- License:
- Abstract: Leveraging planning during learning and decision-making is central to the long-term development of intelligent agents. Recent works have successfully combined tree-based search methods and self-play learning mechanisms to this end. However, these methods typically face scaling challenges due to the sequential nature of their search. While practical engineering solutions can partly overcome this, they often result in a negative impact on performance. In this paper, we introduce SPO: Sequential Monte Carlo Policy Optimisation, a model-based reinforcement learning algorithm grounded within the Expectation Maximisation (EM) framework. We show that SPO provides robust policy improvement and efficient scaling properties. The sample-based search makes it directly applicable to both discrete and continuous action spaces without modifications. We demonstrate statistically significant improvements in performance relative to model-free and model-based baselines across both continuous and discrete environments. Furthermore, the parallel nature of SPO's search enables effective utilisation of hardware accelerators, yielding favourable scaling laws.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning [5.09191791549438]
Recent works have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks.
We propose a method that addresses this optimism bias by explicitly disentangling the policy and world models.
We demonstrate our method's superior performance on a variety of autonomous driving tasks in simulation.
arXiv Detail & Related papers (2022-07-21T04:12:48Z) - Critic Sequential Monte Carlo [15.596665321375298]
CriticSMC is a new algorithm for planning as inference built from a novel composition of sequential Monte Carlo with soft-Q function factors.
Our experiments on self-driving car collision avoidance in simulation demonstrate improvements against baselines in terms of minimization of infraction relative to computational effort.
arXiv Detail & Related papers (2022-05-30T23:14:24Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Model-based Meta Reinforcement Learning using Graph Structured Surrogate
Models [40.08137765886609]
We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics.
Our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding test time policy gradient optimization.
arXiv Detail & Related papers (2021-02-16T17:21:55Z) - Meta Learning MPC using Finite-Dimensional Gaussian Process
Approximations [0.9539495585692008]
Two key factors that hinder the practical applicability of learning methods in control are their high computational complexity and limited generalization capabilities to unseen conditions.
This paper makes use of a meta-learning approach for adaptive model predictive control, by learning a system model that leverages data from previous related tasks.
arXiv Detail & Related papers (2020-08-13T15:59:38Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.