Formal Policy Synthesis for Continuous-Space Systems via Reinforcement
Learning
- URL: http://arxiv.org/abs/2005.01319v2
- Date: Sun, 27 Sep 2020 11:55:30 GMT
- Title: Formal Policy Synthesis for Continuous-Space Systems via Reinforcement
Learning
- Authors: Milad Kazemi and Sadegh Soudjani
- Abstract summary: We show how reinforcement learning can be applied for computing policies that are finite-memory and deterministic.
We develop the required assumptions and theories for the convergence of the learned policy to the optimal policy.
We demonstrate the approach on a 4-dim cart-pole system and 6-dim boat driving problem.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies satisfaction of temporal properties on unknown stochastic
processes that have continuous state spaces. We show how reinforcement learning
(RL) can be applied for computing policies that are finite-memory and
deterministic using only the paths of the stochastic process. We address
properties expressed in linear temporal logic (LTL) and use their automaton
representation to give a path-dependent reward function maximised via the RL
algorithm. We develop the required assumptions and theories for the convergence
of the learned policy to the optimal policy in the continuous state space. To
improve the performance of the learning on the constructed sparse reward
function, we propose a sequential learning procedure based on a sequence of
labelling functions obtained from the positive normal form of the LTL
specification. We use this procedure to guide the RL algorithm towards a policy
that converges to an optimal policy under suitable assumptions on the process.
We demonstrate the approach on a 4-dim cart-pole system and 6-dim boat driving
problem.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - How does Your RL Agent Explore? An Optimal Transport Analysis of Occupancy Measure Trajectories [8.429001045596687]
We represent the learning process of an RL algorithm as a sequence of policies generated during training.
We then study the policy trajectory induced in the manifold of state-action occupancy measures.
arXiv Detail & Related papers (2024-02-14T11:55:50Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Offline Reinforcement Learning with Closed-Form Policy Improvement
Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning.
In this paper, we propose our closed-form policy improvement operators.
We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Policy Gradient and Actor-Critic Learning in Continuous Time and Space:
Theory and Algorithms [1.776746672434207]
We study policy gradient (PG) for reinforcement learning in continuous time and space.
We propose two types of the actor-critic algorithms for RL, where we learn and update value functions and policies simultaneously and alternatingly.
arXiv Detail & Related papers (2021-11-22T14:27:04Z) - Direct Random Search for Fine Tuning of Deep Reinforcement Learning
Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts.
Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z) - Near Optimal Policy Optimization via REPS [33.992374484681704]
emphrelative entropy policy search (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains.
There exist no guarantees on REPS's performance when using gradient-based solvers.
We introduce a technique that uses emphgenerative access to the underlying decision process to compute parameter updates that maintain favorable convergence to the optimal regularized policy.
arXiv Detail & Related papers (2021-03-17T16:22:59Z) - Policy Gradient for Continuing Tasks in Non-stationary Markov Decision
Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities.
We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy.
A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.