Multi-Agent Reinforcement Learning with Temporal Logic Specifications
- URL: http://arxiv.org/abs/2102.00582v1
- Date: Mon, 1 Feb 2021 01:13:03 GMT
- Title: Multi-Agent Reinforcement Learning with Temporal Logic Specifications
- Authors: Lewis Hammond and Alessandro Abate and Julian Gutierrez and Michael
Wooldridge
- Abstract summary: We study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment.
We develop the first multi-agent reinforcement learning technique for temporal logic specifications.
We provide correctness and convergence guarantees for our main algorithm.
- Score: 65.79056365594654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of learning to satisfy temporal logic
specifications with a group of agents in an unknown environment, which may
exhibit probabilistic behaviour. From a learning perspective these
specifications provide a rich formal language with which to capture tasks or
objectives, while from a logic and automated verification perspective the
introduction of learning capabilities allows for practical applications in
large, stochastic, unknown environments. The existing work in this area is,
however, limited. Of the frameworks that consider full linear temporal logic or
have correctness guarantees, all methods thus far consider only the case of a
single temporal logic specification and a single agent. In order to overcome
this limitation, we develop the first multi-agent reinforcement learning
technique for temporal logic specifications, which is also novel in its ability
to handle multiple specifications. We provide correctness and convergence
guarantees for our main algorithm - ALMANAC (Automaton/Logic Multi-Agent
Natural Actor-Critic) - even when using function approximation. Alongside our
theoretical results, we further demonstrate the applicability of our technique
via a set of preliminary experiments.
Related papers
- Active Fine-Tuning of Generalist Policies [54.65568433408307]
We propose AMF (Active Multi-task Fine-tuning) to maximize multi-task policy performance under a limited demonstration budget.
We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness in complex and high-dimensional environments.
arXiv Detail & Related papers (2024-10-07T13:26:36Z) - DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications [59.01527054553122]
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL)
Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments, are restricted to suboptimal solutions, and do not adequately handle safety constraints.
In this work, we propose a novel learning approach to address these concerns.
Our method leverages the structure of B"uchia, which explicitly represent the semantics of automat- specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae.
arXiv Detail & Related papers (2024-10-06T21:30:38Z) - Resilient Constrained Learning [94.27081585149836]
This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task.
We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation.
arXiv Detail & Related papers (2023-06-04T18:14:18Z) - Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes [5.471640959988549]
We first introduce an optimal control theory for partially observable Markov decision processes.
We provide a structured methodology for synthesizing policies that maximize a cumulative reward.
We then build on this approach to design an optimal control framework for logically constrained multi-agent settings.
arXiv Detail & Related papers (2023-05-24T05:15:36Z) - Interpretable Anomaly Detection via Discrete Optimization [1.7150329136228712]
We propose a framework for learning inherently interpretable anomaly detectors from sequential data.
We show that this problem is computationally hard and develop two learning algorithms based on constraint optimization.
Using a prototype implementation, we demonstrate that our approach shows promising results in terms of accuracy and F1 score.
arXiv Detail & Related papers (2023-03-24T16:19:15Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Skill Machines: Temporal Logic Skill Composition in Reinforcement Learning [13.049516752695613]
We propose a framework where an agent learns a sufficient set of skill primitives to achieve all high-level goals in its environment.
The agent can then flexibly compose them both logically and temporally to provably achieve temporal logic specifications in any regular language.
This provides the agent with the ability to map from complex temporal logic task specifications to near-optimal behaviours zero-shot.
arXiv Detail & Related papers (2022-05-25T07:05:24Z) - Inverse Reinforcement Learning of Autonomous Behaviors Encoded as
Weighted Finite Automata [18.972270182221262]
This paper presents a method for learning logical task specifications and cost functions from demonstrations.
We employ a spectral learning approach to extract a weighted finite automaton (WFA), approximating the unknown logic structure of the task.
We define a product between the WFA for high-level task guidance and a Labeled Markov decision process (L-MDP) for low-level control and optimize a cost function that matches the demonstrator's behavior.
arXiv Detail & Related papers (2021-03-10T06:42:10Z) - A General Machine Learning Framework for Survival Analysis [0.8029049649310213]
Many machine learning methods for survival analysis only consider the standard setting with right-censored data and proportional hazards assumption.
We present a very general machine learning framework for time-to-event analysis that uses a data augmentation strategy to reduce complex survival tasks to standard Poisson regression tasks.
arXiv Detail & Related papers (2020-06-27T20:57:18Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.