Model-Based Policy Search Using Monte Carlo Gradient Estimation with
Real Systems Application
- URL: http://arxiv.org/abs/2101.12115v4
- Date: Tue, 6 Sep 2022 10:24:48 GMT
- Title: Model-Based Policy Search Using Monte Carlo Gradient Estimation with
Real Systems Application
- Authors: Fabio Amadio, Alberto Dalla Libera, Riccardo Antonello, Daniel
Nikovski, Ruggero Carli, Diego Romeres
- Abstract summary: We present a Model-Based Reinforcement Learning (MBRL) algorithm named emphMonte Carlo Probabilistic Inference for Learning COntrol (MC-PILCO)
The algorithm relies on Gaussian Processes (GPs) to model the system dynamics and on a Monte Carlo approach to estimate the policy gradient.
Numerical comparisons in a simulated cart-pole environment show that MC-PILCO exhibits better data efficiency and control performance.
- Score: 12.854118767247453
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present a Model-Based Reinforcement Learning (MBRL)
algorithm named \emph{Monte Carlo Probabilistic Inference for Learning COntrol}
(MC-PILCO). The algorithm relies on Gaussian Processes (GPs) to model the
system dynamics and on a Monte Carlo approach to estimate the policy gradient.
This defines a framework in which we ablate the choice of the following
components: (i) the selection of the cost function, (ii) the optimization of
policies using dropout, (iii) an improved data efficiency through the use of
structured kernels in the GP models. The combination of the aforementioned
aspects affects dramatically the performance of MC-PILCO. Numerical comparisons
in a simulated cart-pole environment show that MC-PILCO exhibits better data
efficiency and control performance w.r.t. state-of-the-art GP-based MBRL
algorithms. Finally, we apply MC-PILCO to real systems, considering in
particular systems with partially measurable states. We discuss the importance
of modeling both the measurement system and the state estimators during policy
optimization. The effectiveness of the proposed solutions has been tested in
simulation and on two real systems, a Furuta pendulum and a ball-and-plate rig.
Related papers
- Comparison of Model Predictive Control and Proximal Policy Optimization for a 1-DOF Helicopter System [0.7499722271664147]
This study conducts a comparative analysis of Model Predictive Control (MPC) and Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) algorithm, applied to a Quanser Aero 2 system.
PPO excels in rise-time and adaptability, making it a promising approach for applications requiring rapid response and adaptability.
arXiv Detail & Related papers (2024-08-28T08:35:34Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Learning Control from Raw Position Measurements [13.79048931313603]
We propose a Model-Based Reinforcement Learning (MBRL) algorithm named VF-MC-PILCO.
It is specifically designed for application to mechanical systems where velocities cannot be directly measured.
arXiv Detail & Related papers (2023-01-30T18:50:37Z) - Critic Sequential Monte Carlo [15.596665321375298]
CriticSMC is a new algorithm for planning as inference built from a novel composition of sequential Monte Carlo with soft-Q function factors.
Our experiments on self-driving car collision avoidance in simulation demonstrate improvements against baselines in terms of minimization of infraction relative to computational effort.
arXiv Detail & Related papers (2022-05-30T23:14:24Z) - Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo
sampling [58.14878401145309]
We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model.
We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
arXiv Detail & Related papers (2022-05-12T11:15:47Z) - Machine Learning Simulates Agent-Based Model Towards Policy [0.0]
We use a random forest machine learning algorithm to emulate an agent-based model (ABM) and evaluate competing policies across 46 Metropolitan Regions (MRs) in Brazil.
As a result, we obtain the optimal (and non-optimal) performance of each region over the policies.
Results suggest that MRs already have embedded structures that favor optimal or non-optimal results, but they also illustrate which policy is more beneficial to each place.
arXiv Detail & Related papers (2022-03-04T21:19:11Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - ParticleAugment: Sampling-Based Data Augmentation [80.44268663372233]
We propose a particle filtering formulation to find optimal augmentation policies and their schedules during model training.
We show that our formulation for automated augmentation reaches promising results on CIFAR-10, CIFAR-100, and ImageNet datasets.
arXiv Detail & Related papers (2021-06-16T10:56:02Z) - Model-based Policy Search for Partially Measurable Systems [9.335154302282751]
We propose a Model-Based Reinforcement Learning (MBRL) algorithm for Partially Measurable Systems (PMS)
The proposed algorithm, named Monte Carlo Probabilistic Inference for Learning COntrol for Partially Measurable Systems (MC-PILCO4PMS), relies on Gaussian Processes (GPs) to model the system dynamics.
The effectiveness of the proposed algorithm has been tested both in simulation and in two real systems.
arXiv Detail & Related papers (2021-01-21T17:39:22Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.