Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes
- URL: http://arxiv.org/abs/2305.14736v3
- Date: Wed, 19 Jun 2024 21:11:31 GMT
- Title: Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes
- Authors: Krishna C. Kalagarla, Dhruva Kartik, Dongming Shen, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo,
- Abstract summary: We first introduce an optimal control theory for partially observable Markov decision processes.
We provide a structured methodology for synthesizing policies that maximize a cumulative reward.
We then build on this approach to design an optimal control framework for logically constrained multi-agent settings.
- Score: 5.471640959988549
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous systems often have logical constraints arising, for example, from safety, operational, or regulatory requirements. Such constraints can be expressed using temporal logic specifications. The system state is often partially observable. Moreover, it could encompass a team of multiple agents with a common objective but disparate information structures and constraints. In this paper, we first introduce an optimal control theory for partially observable Markov decision processes (POMDPs) with finite linear temporal logic constraints. We provide a structured methodology for synthesizing policies that maximize a cumulative reward while ensuring that the probability of satisfying a temporal logic constraint is sufficiently high. Our approach comes with guarantees on approximate reward optimality and constraint satisfaction. We then build on this approach to design an optimal control framework for logically constrained multi-agent settings with information asymmetry. We illustrate the effectiveness of our approach by implementing it on several case studies.
Related papers
- Reinforcement Learning Under Probabilistic Spatio-Temporal Constraints
with Time Windows [0.0]
We propose an automata-theoretic approach for reinforcement learning (RL) under complex-temporal constraints with time windows.
We provide theoretical guarantees on the resulting probability of constraint satisfaction.
We also provide numerical results in a scenario where a robot explores the environment to discover high-reward regions.
arXiv Detail & Related papers (2023-07-29T06:47:14Z) - Robust Control for Dynamical Systems With Non-Gaussian Noise via Formal
Abstractions [59.605246463200736]
We present a novel controller synthesis method that does not rely on any explicit representation of the noise distributions.
First, we abstract the continuous control system into a finite-state model that captures noise by probabilistic transitions between discrete states.
We use state-of-the-art verification techniques to provide guarantees on the interval Markov decision process and compute a controller for which these guarantees carry over to the original control system.
arXiv Detail & Related papers (2023-01-04T10:40:30Z) - On the Complexity of Rational Verification [5.230352342979224]
Rational verification refers to the problem of checking which temporal logic properties hold of a concurrent multiagent system.
We show that the complexity of rational verification can be greatly reduced by specifications.
We provide improved results for rational verification when considering players' goals given by mean-payoff utility functions.
arXiv Detail & Related papers (2022-07-06T12:56:22Z) - Logical blocks for fault-tolerant topological quantum computation [55.41644538483948]
We present a framework for universal fault-tolerant logic motivated by the need for platform-independent logical gate definitions.
We explore novel schemes for universal logic that improve resource overheads.
Motivated by the favorable logical error rates for boundaryless computation, we introduce a novel computational scheme.
arXiv Detail & Related papers (2021-12-22T19:00:03Z) - Sparsity in Partially Controllable Linear Systems [56.142264865866636]
We study partially controllable linear dynamical systems specified by an underlying sparsity pattern.
Our results characterize those state variables which are irrelevant for optimal control.
arXiv Detail & Related papers (2021-10-12T16:41:47Z) - Policy Analysis using Synthetic Controls in Continuous-Time [101.35070661471124]
Counterfactual estimation using synthetic controls is one of the most successful recent methodological developments in causal inference.
We propose a continuous-time alternative that models the latent counterfactual path explicitly using the formalism of controlled differential equations.
arXiv Detail & Related papers (2021-02-02T16:07:39Z) - Multi-Agent Reinforcement Learning with Temporal Logic Specifications [65.79056365594654]
We study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment.
We develop the first multi-agent reinforcement learning technique for temporal logic specifications.
We provide correctness and convergence guarantees for our main algorithm.
arXiv Detail & Related papers (2021-02-01T01:13:03Z) - Verifiable Planning in Expected Reward Multichain MDPs [20.456052208569115]
We explore the steady-state planning problem of deriving a decision-making policy for an agent.
We prove that optimal solutions to the proposed programs yield stationary policies with rigorous guarantees of behavior.
arXiv Detail & Related papers (2020-12-03T18:54:24Z) - Learning with Safety Constraints: Sample Complexity of Reinforcement
Learning for Constrained MDPs [13.922754427601491]
We characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy.
Our main finding is that compared to the best known bounds of the unconstrained regime, the sample of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints.
arXiv Detail & Related papers (2020-08-01T18:17:08Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.