Compositional Policy Learning in Stochastic Control Systems with Formal
Guarantees
- URL: http://arxiv.org/abs/2312.01456v1
- Date: Sun, 3 Dec 2023 17:04:18 GMT
- Title: Compositional Policy Learning in Stochastic Control Systems with Formal
Guarantees
- Authors: {\DJ}or{\dj}e \v{Z}ikeli\'c (1), Mathias Lechner (2), Abhinav Verma
(3), Krishnendu Chatterjee (1), Thomas A. Henzinger (1) ((1) Institute of
Science and Technology Austria, (2) Massachusetts Institute of Technology,
(3) The Pennsylvania State University)
- Abstract summary: Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks.
We propose a novel method for learning a composition of neural network policies in environments.
A formal certificate guarantees that a specification over the policy's behavior is satisfied with the desired probability.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning has shown promising results in learning neural network
policies for complicated control tasks. However, the lack of formal guarantees
about the behavior of such policies remains an impediment to their deployment.
We propose a novel method for learning a composition of neural network policies
in stochastic environments, along with a formal certificate which guarantees
that a specification over the policy's behavior is satisfied with the desired
probability. Unlike prior work on verifiable RL, our approach leverages the
compositional nature of logical specifications provided in SpectRL, to learn
over graphs of probabilistic reach-avoid specifications. The formal guarantees
are provided by learning neural network policies together with reach-avoid
supermartingales (RASM) for the graph's sub-tasks and then composing them into
a global policy. We also derive a tighter lower bound compared to previous work
on the probability of reach-avoidance implied by a RASM, which is required to
find a compositional policy with an acceptable probabilistic threshold for
complex tasks with multiple edge policies. We implement a prototype of our
approach and evaluate it on a Stochastic Nine Rooms environment.
Related papers
- Statistical Analysis of Policy Space Compression Problem [54.1754937830779]
Policy search methods are crucial in reinforcement learning, offering a framework to address continuous state-action and partially observable problems.
Reducing the policy space through policy compression emerges as a powerful, reward-free approach to accelerate the learning process.
This technique condenses the policy space into a smaller, representative set while maintaining most of the original effectiveness.
arXiv Detail & Related papers (2024-11-15T02:46:55Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Policy Bifurcation in Safe Reinforcement Learning [35.75059015441807]
In some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous local optima can inevitably lead to constraint violations.
We are the first to identify the generating mechanism of such a phenomenon, and employ topological analysis to rigorously prove the existence of bifurcation in safe RL.
We propose a safe RL algorithm called multimodal policy optimization (MUPO), which utilizes a Gaussian mixture distribution as the policy output.
arXiv Detail & Related papers (2024-03-19T15:54:38Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Verified Probabilistic Policies for Deep Reinforcement Learning [6.85316573653194]
We tackle the problem of verifying probabilistic policies for deep reinforcement learning.
We propose an abstraction approach, based on interval Markov decision processes, that yields guarantees on a policy's execution.
We present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking.
arXiv Detail & Related papers (2022-01-10T23:55:04Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - Continuous Action Reinforcement Learning from a Mixture of Interpretable
Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure.
The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.