On Alternating-Time Temporal Logic, Hyperproperties, and Strategy
Sharing
- URL: http://arxiv.org/abs/2312.12403v2
- Date: Tue, 12 Mar 2024 10:09:50 GMT
- Title: On Alternating-Time Temporal Logic, Hyperproperties, and Strategy
Sharing
- Authors: Raven Beutner, Bernd Finkbeiner
- Abstract summary: We show that HyperATL$*_S$ is a rich specification language that captures important AI-related properties.
We implement our model-checking algorithm in a tool we call HyMASMC and evaluate it on a range of benchmarks.
- Score: 5.584060970507506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Alternating-time temporal logic (ATL$^*$) is a well-established framework for
formal reasoning about multi-agent systems. However, while ATL$^*$ can reason
about the strategic ability of agents (e.g., some coalition $A$ can ensure that
a goal is reached eventually), we cannot compare multiple strategic
interactions, nor can we require multiple agents to follow the same strategy.
For example, we cannot state that coalition $A$ can reach a goal sooner (or
more often) than some other coalition $A'$. In this paper, we propose
HyperATLS$^*_S$, an extension of ATL$^*$ in which we can (1) compare the
outcome of multiple strategic interactions w.r.t. a hyperproperty, i.e., a
property that refers to multiple paths at the same time, and (2) enforce that
some agents share the same strategy. We show that HyperATL$^*_S$ is a rich
specification language that captures important AI-related properties that were
out of reach of existing logics. We prove that model checking of HyperATL$^*_S$
on concurrent game structures is decidable. We implement our model-checking
algorithm in a tool we call HyMASMC and evaluate it on a range of benchmarks.
Related papers
- Hyper Strategy Logic [4.726777092009553]
Strategy logic (SL) is a powerful temporal logic that enables strategic reasoning in multi-agent systems.
We present Hyper Strategy Logic (HyperSL), a strategy logic where the outcome of multiple strategy profiles can be compared w.r.t. a hyperproperty.
We show that HyperSL can capture important properties that cannot be expressed in SL, including non-interference, quantitative Nash equilibria, optimal adversarial planning, and reasoning under imperfect information.
arXiv Detail & Related papers (2024-03-20T16:47:53Z) - The Alternating-Time \mu-Calculus With Disjunctive Explicit Strategies [1.7725414095035827]
We study the strategic abilities of coalitions of agents in concurrent game structures.
Key ingredient of the logic are path quantifiers specifying that some coalition of agents has a joint strategy to enforce a given goal.
We extend ATLES with fixpoint operators and strategy disjunction, arriving at the alternating-time $mu$-calculus with explicit strategies.
arXiv Detail & Related papers (2023-05-30T07:16:59Z) - Can We Find Nash Equilibria at a Linear Rate in Markov Games? [95.10091348976779]
We study decentralized learning in two-player zero-sum discounted Markov games.
The goal is to design a policy optimization algorithm for either agent satisfying two properties.
arXiv Detail & Related papers (2023-03-03T02:40:26Z) - Hierarchical Strategies for Cooperative Multi-Agent Reinforcement
Learning [0.0]
We propose a two-level hierarchical architecture that combines a novel information-theoretic objective with a trajectory prediction model to learn a strategy.
We show that our method establishes a new state of the art being, to the best of our knowledge, the first MARL algorithm to solve all super hard SC II scenarios.
Videos and brief overview of the methods are available at: https://sites.google.com/view/hier-strats-marl/home.
arXiv Detail & Related papers (2022-12-14T18:27:58Z) - Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
We study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches.
We design a computational efficient algorithm to achieve near-optimal regret of $tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot) hides logarithmic terms of $(S,A,H,K)$ in $K$ episodes.
Our technical contribution are two-fold: 1) a near-optimal design scheme to explore
arXiv Detail & Related papers (2022-10-15T09:22:22Z) - Reward-Mixing MDPs with a Few Latent Contexts are Learnable [75.17357040707347]
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs)
Our goal is to learn a near-optimal policy that nearly maximizes the $H$ time-step cumulative rewards in such a model.
arXiv Detail & Related papers (2022-10-05T22:52:00Z) - Provably Efficient Offline Multi-agent Reinforcement Learning via
Strategy-wise Bonus [48.34563955829649]
We propose a strategy-wise concentration principle which builds a confidence interval for the joint strategy.
For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose an efficient algorithm.
All of our algorithms can naturally take a pre-specified strategy class $Pi$ as input and output a strategy close to the best strategy in $Pi$.
arXiv Detail & Related papers (2022-06-01T00:18:15Z) - Coordinated Attacks against Contextual Bandits: Fundamental Limits and
Defense Mechanisms [75.17357040707347]
Motivated by online recommendation systems, we propose the problem of finding the optimal policy in contextual bandits.
The goal is to robustly learn the policy that maximizes rewards for good users with as few user interactions as possible.
We show we can achieve an $tildeO(min(S,A)cdot alpha/epsilon2)$ upper-bound, by employing efficient robust mean estimators.
arXiv Detail & Related papers (2022-01-30T01:45:13Z) - Decentralized Cooperative Reinforcement Learning with Hierarchical
Information Structure [14.919120396838208]
We consider two-agent multi-armed bandits (MABs) and Markov decision processes (MDPs) with a hierarchical information structure arising in applications.
In each step the leader" chooses her action first, and then the follower" decides his action after observing the leader's action.
For the MDP setting, we obtain $widetildemathcalO(sqrtH7S2ABT)$ regret, where $H$ is the number of steps per episode, $S$ is the number of states
arXiv Detail & Related papers (2021-11-01T09:18:07Z) - Communication Efficient Parallel Reinforcement Learning [34.77250498401055]
We consider the problem where $M$ agents interact with $M$ identical and independent environments with $S$ states and $A$ actions.
We aim to find an algorithm that allows the agents to minimize the regret with infrequent communication rounds.
arXiv Detail & Related papers (2021-02-22T02:46:36Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.