Representation of Reinforcement Learning Policies in Reproducing Kernel
Hilbert Spaces
- URL: http://arxiv.org/abs/2002.02863v2
- Date: Thu, 15 Oct 2020 16:00:19 GMT
- Title: Representation of Reinforcement Learning Policies in Reproducing Kernel
Hilbert Spaces
- Authors: Bogdan Mazoure, Thang Doan, Tianyu Li, Vladimir Makarenkov, Joelle
Pineau, Doina Precup, Guillaume Rabusseau
- Abstract summary: This framework involves finding a low-dimensional embedding of the policy on a kernel Hilbert space (RKHS)
We derive strong theoretical guarantees on the expected return of the reconstructed policy.
The results confirm that the policies can be robustly embedded in a low-dimensional space while the embedded policy incurs almost no decrease in return.
- Score: 72.5149277196468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a general framework for policy representation for reinforcement
learning tasks. This framework involves finding a low-dimensional embedding of
the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS
based methods allows us to derive strong theoretical guarantees on the expected
return of the reconstructed policy. Such guarantees are typically lacking in
black-box models, but are very desirable in tasks requiring stability. We
conduct several experiments on classic RL domains. The results confirm that the
policies can be robustly embedded in a low-dimensional space while the embedded
policy incurs almost no decrease in return.
Related papers
- SPoRt -- Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL [54.022106606140774]
We present theoretical results that provide a bound on the probability of violating a safety property for a new task-specific policy in a model-free, episodic setup.
We also present SPoRt, which enables the user to trade off safety guarantees in exchange for task-specific performance.
arXiv Detail & Related papers (2025-04-08T19:09:07Z) - Convergence of Policy Mirror Descent Beyond Compatible Function Approximation [66.4260157478436]
We develop theoretical PMD general policy classes where we strictly assume a weaker variational dominance and obtain convergence to the best-in-class policy.
Our main notion leverages a novel notion induced by the local norm induced by the occupancy- gradient measure.
arXiv Detail & Related papers (2025-02-16T08:05:46Z) - Embedding Safety into RL: A New Take on Trust Region Methods [1.5733417396701983]
Reinforcement Learning (RL) agents are able to solve a wide variety of tasks but are prone to unsafe behaviors.
We propose Constrained Trust Region Policy Optimization (C-TRPO), a novel approach that modifies the geometry of the policy space based on the safety constraints.
arXiv Detail & Related papers (2024-11-05T09:55:50Z) - Compositional Policy Learning in Stochastic Control Systems with Formal
Guarantees [0.0]
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks.
We propose a novel method for learning a composition of neural network policies in environments.
A formal certificate guarantees that a specification over the policy's behavior is satisfied with the desired probability.
arXiv Detail & Related papers (2023-12-03T17:04:18Z) - Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space [9.823296458696882]
In traditional partially observable Markov decision processes, ensuring safety typically involves estimating the belief in latent states.
We propose a model-based approach that guarantees RL safety almost surely in the face of unknown system dynamics.
arXiv Detail & Related papers (2023-12-01T17:01:37Z) - Supported Trust Region Optimization for Offline Reinforcement Learning [59.43508325943592]
We propose Supported Trust Region optimization (STR) which performs trust region policy optimization with the policy constrained within the support of the behavior policy.
We show that, when assuming no approximation and sampling error, STR guarantees strict policy improvement until convergence to the optimal support-constrained policy in the dataset.
arXiv Detail & Related papers (2023-11-15T13:16:16Z) - Feasible Policy Iteration for Safe Reinforcement Learning [29.662547846929847]
Safety is the priority concern when applying reinforcement learning (RL) algorithms to real-world control problems.
We propose feasible policy iteration (FPI), the first foundational dynamic programming algorithm for safe RL.
Experimental results demonstrate that FPI achieves strictly zero constraint violation on low-dimensional tasks.
arXiv Detail & Related papers (2023-04-18T09:18:37Z) - Policy Gradient for Reinforcement Learning with General Utilities [50.65940899590487]
In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards.
Many supervised and unsupervised RL problems are not covered in the Linear RL framework.
We derive the policy gradient theorem for RL with general utilities.
arXiv Detail & Related papers (2022-10-03T14:57:46Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Expert-Supervised Reinforcement Learning for Offline Policy Learning and
Evaluation [21.703965401500913]
We propose an Expert-Supervised RL (ESRL) framework which uses uncertainty quantification for offline policy learning.
In particular, we have three contributions: 1) the method can learn safe and optimal policies through hypothesis testing, 2) ESRL allows for different levels of risk averse implementations tailored to the application context, and 3) we propose a way to interpret ESRL's policy at every state through posterior distributions.
arXiv Detail & Related papers (2020-06-23T17:43:44Z) - Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states.
We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization.
Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.