Formal Controller Synthesis for Continuous-Space MDPs via Model-Free
Reinforcement Learning
- URL: http://arxiv.org/abs/2003.00712v1
- Date: Mon, 2 Mar 2020 08:29:36 GMT
- Title: Formal Controller Synthesis for Continuous-Space MDPs via Model-Free
Reinforcement Learning
- Authors: Abolfazl Lavaei, Fabio Somenzi, Sadegh Soudjani, Ashutosh Trivedi, and
Majid Zamani
- Abstract summary: A novel reinforcement learning scheme to synthesize policies for continuous-space Markov decision processes (MDPs) is proposed.
A key contribution of the paper is to leverage the classical convergence results for reinforcement learning on finite MDPs.
We present a novel potential-based reward shaping technique to produce dense rewards to speed up learning.
- Score: 1.0928470926399565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A novel reinforcement learning scheme to synthesize policies for
continuous-space Markov decision processes (MDPs) is proposed. This scheme
enables one to apply model-free, off-the-shelf reinforcement learning
algorithms for finite MDPs to compute optimal strategies for the corresponding
continuous-space MDPs without explicitly constructing the finite-state
abstraction. The proposed approach is based on abstracting the system with a
finite MDP (without constructing it explicitly) with unknown transition
probabilities, synthesizing strategies over the abstract MDP, and then mapping
the results back over the concrete continuous-space MDP with approximate
optimality guarantees. The properties of interest for the system belong to a
fragment of linear temporal logic, known as syntactically co-safe linear
temporal logic (scLTL), and the synthesis requirement is to maximize the
probability of satisfaction within a given bounded time horizon. A key
contribution of the paper is to leverage the classical convergence results for
reinforcement learning on finite MDPs and provide control strategies maximizing
the probability of satisfaction over unknown, continuous-space MDPs while
providing probabilistic closeness guarantees. Automata-based reward functions
are often sparse; we present a novel potential-based reward shaping technique
to produce dense rewards to speed up learning. The effectiveness of the
proposed approach is demonstrated by applying it to three physical benchmarks
concerning the regulation of a room's temperature, control of a road traffic
cell, and of a 7-dimensional nonlinear model of a BMW 320i car.
Related papers
- Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Formal Controller Synthesis for Markov Jump Linear Systems with
Uncertain Dynamics [64.72260320446158]
We propose a method for synthesising controllers for Markov jump linear systems.
Our method is based on a finite-state abstraction that captures both the discrete (mode-jumping) and continuous (stochastic linear) behaviour of the MJLS.
We apply our method to multiple realistic benchmark problems, in particular, a temperature control and an aerial vehicle delivery problem.
arXiv Detail & Related papers (2022-12-01T17:36:30Z) - Opportunistic Qualitative Planning in Stochastic Systems with Incomplete
Preferences over Reachability Objectives [24.11353445650682]
Preferences play a key role in determining what goals/constraints to satisfy when not all constraints can be satisfied simultaneously.
We present an algorithm to synthesize the SPI and SASI strategies that induce multiple sequential improvements.
arXiv Detail & Related papers (2022-10-04T19:53:08Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Model-Free Reinforcement Learning for Optimal Control of MarkovDecision
Processes Under Signal Temporal Logic Specifications [7.842869080999489]
We present a model-free reinforcement learning algorithm to find an optimal policy for a finite-horizon Markov decision process.
We illustrate the effectiveness of our approach in the context of robotic motion planning for complex missions under uncertainty and performance objectives.
arXiv Detail & Related papers (2021-09-27T22:44:55Z) - Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via
Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments.
It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics.
We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Reinforcement Learning Based Temporal Logic Control with Maximum
Probabilistic Satisfaction [5.337302350000984]
This paper presents a model-free reinforcement learning algorithm to synthesize a control policy.
The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
arXiv Detail & Related papers (2020-10-14T03:49:16Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.