Model-based Policy Search for Partially Measurable Systems
- URL: http://arxiv.org/abs/2101.08740v1
- Date: Thu, 21 Jan 2021 17:39:22 GMT
- Title: Model-based Policy Search for Partially Measurable Systems
- Authors: Fabio Amadio, Alberto Dalla Libera, Ruggero Carli, Daniel Nikovski,
Diego Romeres
- Abstract summary: We propose a Model-Based Reinforcement Learning (MBRL) algorithm for Partially Measurable Systems (PMS)
The proposed algorithm, named Monte Carlo Probabilistic Inference for Learning COntrol for Partially Measurable Systems (MC-PILCO4PMS), relies on Gaussian Processes (GPs) to model the system dynamics.
The effectiveness of the proposed algorithm has been tested both in simulation and in two real systems.
- Score: 9.335154302282751
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a Model-Based Reinforcement Learning (MBRL)
algorithm for Partially Measurable Systems (PMS), i.e., systems where the state
can not be directly measured, but must be estimated through proper state
observers. The proposed algorithm, named Monte Carlo Probabilistic Inference
for Learning COntrol for Partially Measurable Systems (MC-PILCO4PMS), relies on
Gaussian Processes (GPs) to model the system dynamics, and on a Monte Carlo
approach to update the policy parameters. W.r.t. previous GP-based MBRL
algorithms, MC-PILCO4PMS models explicitly the presence of state observers
during policy optimization, allowing to deal PMS. The effectiveness of the
proposed algorithm has been tested both in simulation and in two real systems.
Related papers
- Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - Probabilistic Model Checking of Stochastic Reinforcement Learning Policies [5.923818043882103]
We introduce a method to verify reinforcement learning (RL) policies.
This approach is compatible with any RL algorithm as long as the algorithm and its corresponding environment collectively adhere to the Markov property.
Our results show that our method is suited to verify RL policies.
arXiv Detail & Related papers (2024-03-27T16:15:21Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Learning-Based Optimal Control with Performance Guarantees for Unknown Systems with Latent States [4.4820711784498]
This paper proposes a novel method for the computation of an optimal input trajectory for unknown nonlinear systems with latent states.
The effectiveness of the proposed method is demonstrated in a numerical simulation.
arXiv Detail & Related papers (2023-03-31T11:06:09Z) - Learning Control from Raw Position Measurements [13.79048931313603]
We propose a Model-Based Reinforcement Learning (MBRL) algorithm named VF-MC-PILCO.
It is specifically designed for application to mechanical systems where velocities cannot be directly measured.
arXiv Detail & Related papers (2023-01-30T18:50:37Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP)
The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task.
The second is a shielding approach that prevents POMCP from selecting unexpected actions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z) - Model-Based Policy Search Using Monte Carlo Gradient Estimation with
Real Systems Application [12.854118767247453]
We present a Model-Based Reinforcement Learning (MBRL) algorithm named emphMonte Carlo Probabilistic Inference for Learning COntrol (MC-PILCO)
The algorithm relies on Gaussian Processes (GPs) to model the system dynamics and on a Monte Carlo approach to estimate the policy gradient.
Numerical comparisons in a simulated cart-pole environment show that MC-PILCO exhibits better data efficiency and control performance.
arXiv Detail & Related papers (2021-01-28T17:01:15Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - TS-MPC for Autonomous Vehicle using a Learning Approach [0.0]
We use a data-driven approach to learn a Takagi-Sugeno (TS) representation of the vehicle dynamics.
To address the TS modeling, we use the Adaptive Neuro-Fuzzy Inference System (ANFIS) approach.
The proposed control approach is provided by racing-based references of an external planner and estimations from the MHE.
arXiv Detail & Related papers (2020-04-29T17:42:33Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.