Neurosymbolic Reinforcement Learning with Formally Verified Exploration
- URL: http://arxiv.org/abs/2009.12612v2
- Date: Mon, 26 Oct 2020 14:02:51 GMT
- Title: Neurosymbolic Reinforcement Learning with Formally Verified Exploration
- Authors: Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri
- Abstract summary: We present Revel, a framework for provably safe exploration in continuous state and action spaces.
A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible.
We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification.
- Score: 21.23874800091344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Revel, a partially neural reinforcement learning (RL) framework
for provably safe exploration in continuous state and action spaces. A key
challenge for provably safe deep RL is that repeatedly verifying neural
networks within a learning loop is computationally infeasible. We address this
challenge using two policy classes: a general, neurosymbolic class with
approximate gradients and a more restricted class of symbolic policies that
allows efficient verification. Our learning algorithm is a mirror descent over
policies: in each iteration, it safely lifts a symbolic policy into the
neurosymbolic space, performs safe gradient updates to the resulting policy,
and projects the updated policy into the safe symbolic subset, all without
requiring explicit verification of neural networks. Our empirical results show
that Revel enforces safe exploration in many scenarios in which Constrained
Policy Optimization does not, and that it can discover policies that outperform
those learned through prior approaches to verified exploration.
Related papers
- Compositional Policy Learning in Stochastic Control Systems with Formal
Guarantees [0.0]
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks.
We propose a novel method for learning a composition of neural network policies in environments.
A formal certificate guarantees that a specification over the policy's behavior is satisfied with the desired probability.
arXiv Detail & Related papers (2023-12-03T17:04:18Z) - Deep Explainable Relational Reinforcement Learning: A Neuro-Symbolic
Approach [18.38878415765146]
We propose Explainable Reinforcement Learning (DERRL), a framework that exploits the best of both -- neural and symbolic worlds.
DERRL combines relational representations and constraints from symbolic planning with deep learning to extract interpretable policies.
These policies are in the form of logical rules that explain how each decision (or action) is arrived at.
arXiv Detail & Related papers (2023-04-17T15:11:40Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Neuro-Symbolic Reinforcement Learning with First-Order Logic [63.003353499732434]
We propose a novel RL method for text-based games with a recent neuro-symbolic framework called Logical Neural Network.
Our experimental results show RL training with the proposed method converges significantly faster than other state-of-the-art neuro-symbolic methods in a TextWorld benchmark.
arXiv Detail & Related papers (2021-10-21T08:21:49Z) - Learning Barrier Certificates: Towards Safe Reinforcement Learning with
Zero Training-time Violations [64.39401322671803]
This paper explores the possibility of safe RL algorithms with zero training-time safety violations.
We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies.
arXiv Detail & Related papers (2021-08-04T04:59:05Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Simplifying Deep Reinforcement Learning via Self-Supervision [51.2400839966489]
Self-Supervised Reinforcement Learning (SSRL) is a simple algorithm that optimize policies with purely supervised losses.
We show that SSRL is surprisingly competitive to contemporary algorithms with more stable performance and less running time.
arXiv Detail & Related papers (2021-06-10T06:29:59Z) - Learning Intrinsic Symbolic Rewards in Reinforcement Learning [7.101885582663675]
We present a method that discovers dense rewards in the form of low-dimensional symbolic trees.
We show that the discovered dense rewards are an effective signal for an RL policy to solve the benchmark tasks.
arXiv Detail & Related papers (2020-10-08T00:02:46Z) - Continuous Action Reinforcement Learning from a Mixture of Interpretable
Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure.
The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.