Paths to Equilibrium in Normal-Form Games
- URL: http://arxiv.org/abs/2403.18079v1
- Date: Tue, 26 Mar 2024 19:58:39 GMT
- Title: Paths to Equilibrium in Normal-Form Games
- Authors: Bora Yongacoglu, Gürdal Arslan, Lacra Pavel, Serdar Yüksel,
- Abstract summary: In multi-agent reinforcement learning (MARL), agents repeatedly interact across time and revise their strategies as new data arrives.
This paper studies sequences of strategies satisfying a pairwise constraint inspired by policy updating in reinforcement learning.
- Score: 6.812247730094933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In multi-agent reinforcement learning (MARL), agents repeatedly interact across time and revise their strategies as new data arrives, producing a sequence of strategy profiles. This paper studies sequences of strategies satisfying a pairwise constraint inspired by policy updating in reinforcement learning, where an agent who is best responding in period $t$ does not switch its strategy in the next period $t+1$. This constraint merely requires that optimizing agents do not switch strategies, but does not constrain the other non-optimizing agents in any way, and thus allows for exploration. Sequences with this property are called satisficing paths, and arise naturally in many MARL algorithms. A fundamental question about strategic dynamics is such: for a given game and initial strategy profile, is it always possible to construct a satisficing path that terminates at an equilibrium strategy? The resolution of this question has implications about the capabilities or limitations of a class of MARL algorithms. We answer this question in the affirmative for mixed extensions of finite normal-form games.%
Related papers
- Strategic Apple Tasting [35.25249063553063]
Algorithmic decision-making in high-stakes domains often involves assigning decisions to agents with incentives to strategically modify their input to the algorithm.
We formalize this setting as an online learning problem with apple-tasting feedback.
Our goal is to achieve sublinear strategic regret, which compares the performance of the principal to that of the best fixed policy in hindsight.
arXiv Detail & Related papers (2023-06-09T20:46:31Z) - The Alternating-Time \mu-Calculus With Disjunctive Explicit Strategies [1.7725414095035827]
We study the strategic abilities of coalitions of agents in concurrent game structures.
Key ingredient of the logic are path quantifiers specifying that some coalition of agents has a joint strategy to enforce a given goal.
We extend ATLES with fixpoint operators and strategy disjunction, arriving at the alternating-time $mu$-calculus with explicit strategies.
arXiv Detail & Related papers (2023-05-30T07:16:59Z) - Hierarchical Strategies for Cooperative Multi-Agent Reinforcement
Learning [0.0]
We propose a two-level hierarchical architecture that combines a novel information-theoretic objective with a trajectory prediction model to learn a strategy.
We show that our method establishes a new state of the art being, to the best of our knowledge, the first MARL algorithm to solve all super hard SC II scenarios.
Videos and brief overview of the methods are available at: https://sites.google.com/view/hier-strats-marl/home.
arXiv Detail & Related papers (2022-12-14T18:27:58Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - Who Leads and Who Follows in Strategic Classification? [82.44386576129295]
We argue that the order of play in strategic classification is fundamentally determined by the relative frequencies at which the decision-maker and the agents adapt to each other's actions.
We show that a decision-maker with the freedom to choose their update frequency can induce learning dynamics that converge to Stackelberg equilibria with either order of play.
arXiv Detail & Related papers (2021-06-23T16:48:46Z) - Bandit Linear Optimization for Sequential Decision Making and
Extensive-Form Games [102.23975166536326]
Tree-form sequential decision making (TFSDM) extends classical one-shot decision making by modeling tree-form interactions between an agent and a potentially adversarial environment.
It captures the online decision-making problems that each player faces in an extensive-form game, as well as Markov decision processes and partially-observable Markov decision processes where the agent conditions on observed history.
In this paper, we give the first algorithm for the bandit linear optimization problem for dilatedDM that offers both (i) linear-time losses and (ii) $O(sqrtT)$ cumulative regret in
arXiv Detail & Related papers (2021-03-08T05:00:13Z) - Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit
Reasoning Strategies [78.68534915690404]
StrategyQA is a benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy.
We propose a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts.
Overall, StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs.
arXiv Detail & Related papers (2021-01-06T19:14:23Z) - On the Impossibility of Convergence of Mixed Strategies with No Regret
Learning [10.515544361834241]
We study convergence properties of the mixed strategies that result from a general class of optimal no regret learning strategies.
We consider the class of strategies whose information set at each step is the empirical average of the opponent's realized play.
arXiv Detail & Related papers (2020-12-03T18:02:40Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.