Learning Optimal Strategies for Temporal Tasks in Stochastic Games
- URL: http://arxiv.org/abs/2102.04307v3
- Date: Thu, 31 Aug 2023 00:20:06 GMT
- Title: Learning Optimal Strategies for Temporal Tasks in Stochastic Games
- Authors: Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, Miroslav Pajic
- Abstract summary: We introduce a model-free reinforcement learning (RL) approach to derive controllers from given specifications.
We learn optimal control strategies that maximize the probability of satisfying the specifications against the worst-case environment behavior.
- Score: 23.012106429532633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthesis from linear temporal logic (LTL) specifications provides assured
controllers for systems operating in stochastic and potentially adversarial
environments. Automatic synthesis tools, however, require a model of the
environment to construct controllers. In this work, we introduce a model-free
reinforcement learning (RL) approach to derive controllers from given LTL
specifications even when the environment is completely unknown. We model the
problem as a stochastic game (SG) between the controller and the adversarial
environment; we then learn optimal control strategies that maximize the
probability of satisfying the LTL specifications against the worst-case
environment behavior. We first construct a product game using the deterministic
parity automaton (DPA) translated from the given LTL specification. By deriving
distinct rewards and discount factors from the acceptance condition of the DPA,
we reduce the maximization of the worst-case probability of satisfying the LTL
specification into the maximization of a discounted reward objective in the
product game; this enables the use of model-free RL algorithms to learn an
optimal controller strategy. To deal with the common scalability problems when
the number of sets defining the acceptance condition of the DPA (usually
referred as colors), is large, we propose a lazy color generation method where
distinct rewards and discount factors are utilized only when needed, and an
approximate method where the controller eventually focuses on only one color.
In several case studies, we show that our approach is scalable to a wide range
of LTL formulas, significantly outperforming existing methods for learning
controllers from LTL specifications in SGs.
Related papers
- Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - Stochastic Optimal Control Matching [53.156277491861985]
Our work introduces Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for optimal control.
The control is learned via a least squares problem by trying to fit a matching vector field.
Experimentally, our algorithm achieves lower error than all the existing IDO techniques for optimal control.
arXiv Detail & Related papers (2023-12-04T16:49:43Z) - Signal Temporal Logic Neural Predictive Control [15.540490027770621]
We propose a method to learn a neural network controller to satisfy the requirements specified in Signal temporal logic (STL)
Our controller learns to roll out trajectories to maximize the STL robustness score in training.
A backup policy is designed to ensure safety when our controller fails.
arXiv Detail & Related papers (2023-09-10T20:31:25Z) - Learning Minimally-Violating Continuous Control for Infeasible Linear
Temporal Logic Specifications [2.496282558123411]
This paper explores continuous-time control for target-driven navigation to satisfy complex high-level tasks expressed as linear temporal logic (LTL)
We propose a model-free synthesis framework using deep reinforcement learning (DRL) where the underlying dynamic system is unknown (an opaque box)
arXiv Detail & Related papers (2022-10-03T18:32:20Z) - Actor-Critic based Improper Reinforcement Learning [61.430513757337486]
We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process.
We propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic scheme and a Natural Actor-Critic scheme.
arXiv Detail & Related papers (2022-07-19T05:55:02Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Reinforcement Learning Based Temporal Logic Control with Maximum
Probabilistic Satisfaction [5.337302350000984]
This paper presents a model-free reinforcement learning algorithm to synthesize a control policy.
The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
arXiv Detail & Related papers (2020-10-14T03:49:16Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z) - Stochastic Finite State Control of POMDPs with LTL Specifications [14.163899014007647]
Partially observable Markov decision processes (POMDPs) provide a modeling framework for autonomous decision making under uncertainty.
This paper considers the quantitative problem of synthesizing sub-optimal finite state controllers (sFSCs) for POMDPs.
We propose a bounded policy algorithm, leading to a controlled growth in sFSC size and an any time algorithm, where the performance of the controller improves with successive iterations.
arXiv Detail & Related papers (2020-01-21T18:10:47Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.