Synthesis of Reward Machines for Multi-Agent Equilibrium Design (Full Version)
- URL: http://arxiv.org/abs/2408.10074v1
- Date: Mon, 19 Aug 2024 15:17:58 GMT
- Title: Synthesis of Reward Machines for Multi-Agent Equilibrium Design (Full Version)
- Authors: Muhammad Najib, Giuseppe Perelli,
- Abstract summary: We study the problem of equilibrium design using dynamic incentive structures, known as reward machines.
We show how reward machines can be used to represent dynamic incentives that allocate rewards in a manner that optimises the designer's goal.
We present two variants of the problem: strong and weak. We demonstrate that both can be solved in time using a Turing machine equipped with an NP oracle.
- Score: 2.2099217573031678
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mechanism design is a well-established game-theoretic paradigm for designing games to achieve desired outcomes. This paper addresses a closely related but distinct concept, equilibrium design. Unlike mechanism design, the designer's authority in equilibrium design is more constrained; she can only modify the incentive structures in a given game to achieve certain outcomes without the ability to create the game from scratch. We study the problem of equilibrium design using dynamic incentive structures, known as reward machines. We use weighted concurrent game structures for the game model, with goals (for the players and the designer) defined as mean-payoff objectives. We show how reward machines can be used to represent dynamic incentives that allocate rewards in a manner that optimises the designer's goal. We also introduce the main decision problem within our framework, the payoff improvement problem. This problem essentially asks whether there exists a dynamic incentive (represented by some reward machine) that can improve the designer's payoff by more than a given threshold value. We present two variants of the problem: strong and weak. We demonstrate that both can be solved in polynomial time using a Turing machine equipped with an NP oracle. Furthermore, we also establish that these variants are either NP-hard or coNP-hard. Finally, we show how to synthesise the corresponding reward machine if it exists.
Related papers
- Imperfect-Recall Games: Equilibrium Concepts and Their Complexity [74.01381499760288]
We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before.
In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer settings.
arXiv Detail & Related papers (2024-06-23T00:27:28Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - The Computational Complexity of Single-Player Imperfect-Recall Games [37.550554344575]
We study extensive-form games with imperfect recall, such as the Sleeping Beauty problem or the Absent Driver game.
For such games, two natural equilibrium concepts have been proposed as alternative solution concepts to ex-ante optimality.
arXiv Detail & Related papers (2023-05-28T19:41:25Z) - Differentiable Arbitrating in Zero-sum Markov Games [59.62061049680365]
We study how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating.
The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way.
We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level.
arXiv Detail & Related papers (2023-02-20T16:05:04Z) - How Bad is Selfish Driving? Bounding the Inefficiency of Equilibria in
Urban Driving Games [64.71476526716668]
We study the (in)efficiency of any equilibrium players might agree to play.
We obtain guarantees that refine existing bounds on the Price of Anarchy.
Although the obtained guarantees concern open-loop trajectories, we observe efficient equilibria even when agents employ closed-loop policies.
arXiv Detail & Related papers (2022-10-24T09:32:40Z) - Assisted Robust Reward Design [33.55440481096258]
In practice, reward design is an iterative process: the designer chooses a reward, eventually encounters an "edge-case" environment where the reward incentivizes the wrong behavior, revises the reward, and repeats.
We propose that the robot not take the specified reward for granted, but rather have uncertainty about it, and account for the future design iterations as future evidence.
We test this method in a simplified autonomous driving task and find that it more quickly improves the car's behavior in held-out environments by proposing environments that are "edge cases" for the current reward.
arXiv Detail & Related papers (2021-11-18T18:59:33Z) - Inducing Equilibria via Incentives: Simultaneous Design-and-Play Finds
Global Optima [114.31577038081026]
We propose an efficient method that tackles the designer's and agents' problems simultaneously in a single loop.
Although the designer does not solve the equilibrium problem repeatedly, it can anticipate the overall influence of the incentives on the agents.
We prove that the algorithm converges to the global optima at a sublinear rate for a broad class of games.
arXiv Detail & Related papers (2021-10-04T06:53:59Z) - Equilibrium Design for Concurrent Games [5.9873770241999]
In game theory, mechanism design is concerned with the design of incentives so that a desired outcome of the game can be achieved.
We study the design of incentives so that a desirable equilibrium is obtained, for instance, an equilibrium satisfying a given temporal logic property.
As an application, equilibrium design can be used as an alternative solution to the rational synthesis and verification problems for concurrent games.
arXiv Detail & Related papers (2021-06-18T15:45:45Z) - Deep Policy Networks for NPC Behaviors that Adapt to Changing Design
Parameters in Roguelike Games [137.86426963572214]
Turn-based strategy games like Roguelikes, for example, present unique challenges to Deep Reinforcement Learning (DRL)
We propose two network architectures to better handle complex categorical state spaces and to mitigate the need for retraining forced by design decisions.
arXiv Detail & Related papers (2020-12-07T08:47:25Z) - Metagame Autobalancing for Competitive Multiplayer Games [0.10499611180329801]
We present a tool for balancing multi-player games during game design.
Our approach requires a designer to construct an intuitive graphical representation of their meta-game target.
We show the capabilities of this tool on examples inheriting from Rock-Paper-Scissors, and on a more complex asymmetric fighting game.
arXiv Detail & Related papers (2020-06-08T08:55:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.