Representation of Reinforcement Learning Policies in Reproducing Kernel
Hilbert Spaces
- URL: http://arxiv.org/abs/2002.02863v2
- Date: Thu, 15 Oct 2020 16:00:19 GMT
- Title: Representation of Reinforcement Learning Policies in Reproducing Kernel
Hilbert Spaces
- Authors: Bogdan Mazoure, Thang Doan, Tianyu Li, Vladimir Makarenkov, Joelle
Pineau, Doina Precup, Guillaume Rabusseau
- Abstract summary: This framework involves finding a low-dimensional embedding of the policy on a kernel Hilbert space (RKHS)
We derive strong theoretical guarantees on the expected return of the reconstructed policy.
The results confirm that the policies can be robustly embedded in a low-dimensional space while the embedded policy incurs almost no decrease in return.
- Score: 72.5149277196468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a general framework for policy representation for reinforcement
learning tasks. This framework involves finding a low-dimensional embedding of
the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS
based methods allows us to derive strong theoretical guarantees on the expected
return of the reconstructed policy. Such guarantees are typically lacking in
black-box models, but are very desirable in tasks requiring stability. We
conduct several experiments on classic RL domains. The results confirm that the
policies can be robustly embedded in a low-dimensional space while the embedded
policy incurs almost no decrease in return.
Related papers
- Convergence of Policy Mirror Descent Beyond Compatible Function Approximation [66.4260157478436]
We develop theoretical PMD general policy classes where we strictly assume a weaker variational dominance and obtain convergence to the best-in-class policy.
Our main notion leverages a novel notion induced by the local norm induced by the occupancy- gradient measure.
arXiv Detail & Related papers (2025-02-16T08:05:46Z) - Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning [27.868175900131313]
Reinforcement learning (RL) aims to estimate the action to take given a (time-varying) state.
This paper postulates multi-linear mappings to efficiently estimate the parameters of the RL policy.
We leverage the PARAFAC decomposition to design tensor low-rank policies.
arXiv Detail & Related papers (2025-01-08T23:22:08Z) - Embedding Safety into RL: A New Take on Trust Region Methods [1.5733417396701983]
We introduce Constrained Trust Region Policy Optimization (C-TRPO), which reshapes policy space to ensure trust regions contain only safe policies.
Experiments show that C-TRPO reduces constraint violations while maintaining competitive returns.
arXiv Detail & Related papers (2024-11-05T09:55:50Z) - Compositional Policy Learning in Stochastic Control Systems with Formal
Guarantees [0.0]
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks.
We propose a novel method for learning a composition of neural network policies in environments.
A formal certificate guarantees that a specification over the policy's behavior is satisfied with the desired probability.
arXiv Detail & Related papers (2023-12-03T17:04:18Z) - Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space [9.823296458696882]
In traditional partially observable Markov decision processes, ensuring safety typically involves estimating the belief in latent states.
We propose a model-based approach that guarantees RL safety almost surely in the face of unknown system dynamics.
arXiv Detail & Related papers (2023-12-01T17:01:37Z) - Supported Trust Region Optimization for Offline Reinforcement Learning [59.43508325943592]
We propose Supported Trust Region optimization (STR) which performs trust region policy optimization with the policy constrained within the support of the behavior policy.
We show that, when assuming no approximation and sampling error, STR guarantees strict policy improvement until convergence to the optimal support-constrained policy in the dataset.
arXiv Detail & Related papers (2023-11-15T13:16:16Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states.
We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization.
Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.