Related papers: Multi-Agent Reinforcement Learning with Temporal Logic Specifications

Multi-Agent Reinforcement Learning with Temporal Logic Specifications

URL: http://arxiv.org/abs/2102.00582v1
Date: Mon, 1 Feb 2021 01:13:03 GMT
Title: Multi-Agent Reinforcement Learning with Temporal Logic Specifications
Authors: Lewis Hammond and Alessandro Abate and Julian Gutierrez and Michael Wooldridge
Abstract summary: We study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment. We develop the first multi-agent reinforcement learning technique for temporal logic specifications. We provide correctness and convergence guarantees for our main algorithm.
Score: 65.79056365594654
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment, which may exhibit probabilistic behaviour. From a learning perspective these specifications provide a rich formal language with which to capture tasks or objectives, while from a logic and automated verification perspective the introduction of learning capabilities allows for practical applications in large, stochastic, unknown environments. The existing work in this area is, however, limited. Of the frameworks that consider full linear temporal logic or have correctness guarantees, all methods thus far consider only the case of a single temporal logic specification and a single agent. In order to overcome this limitation, we develop the first multi-agent reinforcement learning technique for temporal logic specifications, which is also novel in its ability to handle multiple specifications. We provide correctness and convergence guarantees for our main algorithm - ALMANAC (Automaton/Logic Multi-Agent Natural Actor-Critic) - even when using function approximation. Alongside our theoretical results, we further demonstrate the applicability of our technique via a set of preliminary experiments.

Related papers

A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems [93.8285345915925]
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems. We categorize existing methods along two dimensions: (1) Regimes, which define the stage at which reasoning is achieved; and (2) Architectures, which determine the components involved in the reasoning process.
arXiv Detail & Related papers (2025-04-12T01:27:49Z)
Inductive Learning of Robot Task Knowledge from Raw Data and Online Expert Feedback [3.10979520014442]
An increasing level of autonomy of robots poses challenges of trust and social acceptance, especially in human-robot interaction scenarios. This requires an interpretable implementation of robotic cognitive capabilities, possibly based on formal methods as logics for the definition of task specifications. We propose an offline algorithm based on inductive logic programming from noisy examples to extract task specifications.
arXiv Detail & Related papers (2025-01-13T17:25:46Z)
Active Fine-Tuning of Generalist Policies [54.65568433408307]
We propose AMF (Active Multi-task Fine-tuning) to maximize multi-task policy performance under a limited demonstration budget. We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness in complex and high-dimensional environments.
arXiv Detail & Related papers (2024-10-07T13:26:36Z)
DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications [59.01527054553122]
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL) Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments, are restricted to suboptimal solutions, and do not adequately handle safety constraints. In this work, we propose a novel learning approach to address these concerns. Our method leverages the structure of B"uchia, which explicitly represent the semantics of automat- specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae.
arXiv Detail & Related papers (2024-10-06T21:30:38Z)
Resilient Constrained Learning [94.27081585149836]
This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task. We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation.
arXiv Detail & Related papers (2023-06-04T18:14:18Z)
Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes [5.471640959988549]
We first introduce an optimal control theory for partially observable Markov decision processes. We provide a structured methodology for synthesizing policies that maximize a cumulative reward. We then build on this approach to design an optimal control framework for logically constrained multi-agent settings.
arXiv Detail & Related papers (2023-05-24T05:15:36Z)
Interpretable Anomaly Detection via Discrete Optimization [1.7150329136228712]
We propose a framework for learning inherently interpretable anomaly detectors from sequential data. We show that this problem is computationally hard and develop two learning algorithms based on constraint optimization. Using a prototype implementation, we demonstrate that our approach shows promising results in terms of accuracy and F1 score.
arXiv Detail & Related papers (2023-03-24T16:19:15Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
Skill Machines: Temporal Logic Skill Composition in Reinforcement Learning [13.049516752695613]
We propose a framework where an agent learns a sufficient set of skill primitives to achieve all high-level goals in its environment. The agent can then flexibly compose them both logically and temporally to provably achieve temporal logic specifications in any regular language. This provides the agent with the ability to map from complex temporal logic task specifications to near-optimal behaviours zero-shot.
arXiv Detail & Related papers (2022-05-25T07:05:24Z)
Inverse Reinforcement Learning of Autonomous Behaviors Encoded as Weighted Finite Automata [18.972270182221262]
This paper presents a method for learning logical task specifications and cost functions from demonstrations. We employ a spectral learning approach to extract a weighted finite automaton (WFA), approximating the unknown logic structure of the task. We define a product between the WFA for high-level task guidance and a Labeled Markov decision process (L-MDP) for low-level control and optimize a cost function that matches the demonstrator's behavior.
arXiv Detail & Related papers (2021-03-10T06:42:10Z)
A General Machine Learning Framework for Survival Analysis [0.8029049649310213]
Many machine learning methods for survival analysis only consider the standard setting with right-censored data and proportional hazards assumption. We present a very general machine learning framework for time-to-event analysis that uses a data augmentation strategy to reduce complex survival tasks to standard Poisson regression tasks.
arXiv Detail & Related papers (2020-06-27T20:57:18Z)
Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs) The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.