Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space
- URL: http://arxiv.org/abs/2312.00727v1
- Date: Fri, 1 Dec 2023 17:01:37 GMT
- Title: Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space
- Authors: Xiaoyuan Cheng, Boli Chen, Liz Varga, Yukun Hu
- Abstract summary: In traditional partially observable Markov decision processes, ensuring safety typically involves estimating the belief in latent states.
We propose a model-based approach that guarantees RL safety almost surely in the face of unknown system dynamics.
- Score: 9.823296458696882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper delves into the problem of safe reinforcement learning (RL) in a
partially observable environment with the aim of achieving safe-reachability
objectives. In traditional partially observable Markov decision processes
(POMDP), ensuring safety typically involves estimating the belief in latent
states. However, accurately estimating an optimal Bayesian filter in POMDP to
infer latent states from observations in a continuous state space poses a
significant challenge, largely due to the intractable likelihood. To tackle
this issue, we propose a stochastic model-based approach that guarantees RL
safety almost surely in the face of unknown system dynamics and partial
observation environments. We leveraged the Predictive State Representation
(PSR) and Reproducing Kernel Hilbert Space (RKHS) to represent future
multi-step observations analytically, and the results in this context are
provable. Furthermore, we derived essential operators from the kernel Bayes'
rule, enabling the recursive estimation of future observations using various
operators. Under the assumption of \textit{undercompleness}, a polynomial
sample complexity is established for the RL algorithm for the infinite size of
observation and action spaces, ensuring an $\epsilon-$suboptimal safe policy
guarantee.
Related papers
- Tilted Quantile Gradient Updates for Quantile-Constrained Reinforcement Learning [12.721239079824622]
We propose a safe reinforcement learning (RL) paradigm that enables a higher level of safety without any expectation-form approximations.
A tilted update strategy for quantile gradients is implemented to compensate the asymmetric distributional density.
Experiments demonstrate that the proposed model fully meets safety requirements (quantile constraints) while outperforming the state-of-the-art benchmarks with higher return.
arXiv Detail & Related papers (2024-12-17T18:58:00Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Hallucinated Adversarial Control for Conservative Offline Policy
Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance.
We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics.
We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z) - Safe Reinforcement Learning From Pixels Using a Stochastic Latent
Representation [3.5884936187733394]
We address the problem of safe reinforcement learning from pixel observations.
We formalize the problem in a constrained, partially observable Markov decision process framework.
We employ a novel safety critic using the latent actor-critic (SLAC) approach.
arXiv Detail & Related papers (2022-10-02T19:55:42Z) - Computationally Efficient PAC RL in POMDPs with Latent Determinism and
Conditional Embeddings [97.12538243736705]
We study reinforcement learning with function approximation for large-scale Partially Observable Decision Processes (POMDPs)
Our algorithm provably scales to large-scale POMDPs.
arXiv Detail & Related papers (2022-06-24T05:13:35Z) - Meta-Learning Hypothesis Spaces for Sequential Decision-making [79.73213540203389]
We propose to meta-learn a kernel from offline data (Meta-KeL)
Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets.
We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.
arXiv Detail & Related papers (2022-02-01T17:46:51Z) - Robust and Adaptive Temporal-Difference Learning Using An Ensemble of
Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning.
The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs.
To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z) - Exploring the Training Robustness of Distributional Reinforcement
Learning against Noisy State Observations [7.776010676090131]
State observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training.
In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return.
arXiv Detail & Related papers (2021-09-17T22:37:39Z) - Lyapunov-based uncertainty-aware safe reinforcement learning [0.0]
InReinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decision-making tasks.
In many real-world RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety.
We propose a Lyapunov-based uncertainty-aware safe RL model to address these limitations.
arXiv Detail & Related papers (2021-07-29T13:08:15Z) - Representation of Reinforcement Learning Policies in Reproducing Kernel
Hilbert Spaces [72.5149277196468]
This framework involves finding a low-dimensional embedding of the policy on a kernel Hilbert space (RKHS)
We derive strong theoretical guarantees on the expected return of the reconstructed policy.
The results confirm that the policies can be robustly embedded in a low-dimensional space while the embedded policy incurs almost no decrease in return.
arXiv Detail & Related papers (2020-02-07T15:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.