Learning in two-player games between transparent opponents
- URL: http://arxiv.org/abs/2012.02671v1
- Date: Fri, 4 Dec 2020 15:41:07 GMT
- Title: Learning in two-player games between transparent opponents
- Authors: Adrian Hutter
- Abstract summary: We consider a scenario in which two reinforcement learning agents repeatedly play a matrix game against each other.
The agents' decision-making is transparent to each other, which allows each agent to predict how their opponent will play against them.
We find that the combination of mutually transparent decision-making and opponent-aware learning robustly leads to mutual cooperation in a single-shot prisoner's dilemma.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider a scenario in which two reinforcement learning agents repeatedly
play a matrix game against each other and update their parameters after each
round. The agents' decision-making is transparent to each other, which allows
each agent to predict how their opponent will play against them. To prevent an
infinite regress of both agents recursively predicting each other indefinitely,
each agent is required to give an opponent-independent response with some
probability at least epsilon. Transparency also allows each agent to anticipate
and shape the other agent's gradient step, i.e. to move to regions of parameter
space in which the opponent's gradient points in a direction favourable to
them. We study the resulting dynamics experimentally, using two algorithms from
previous literature (LOLA and SOS) for opponent-aware learning. We find that
the combination of mutually transparent decision-making and opponent-aware
learning robustly leads to mutual cooperation in a single-shot prisoner's
dilemma. In a game of chicken, in which both agents try to manoeuvre their
opponent towards their preferred equilibrium, converging to a mutually
beneficial outcome turns out to be much harder, and opponent-aware learning can
even lead to worst-case outcomes for both agents. This highlights the need to
develop opponent-aware learning algorithms that achieve acceptable outcomes in
social dilemmas involving an equilibrium selection problem.
Related papers
- Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent.
We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.
Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z) - Impact of Decentralized Learning on Player Utilities in Stackelberg Games [57.08270857260131]
In many two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned.
We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks result in worst-case linear regret for at least one player.
We develop algorithms to achieve near-optimal $O(T2/3)$ regret for both players with respect to these benchmarks.
arXiv Detail & Related papers (2024-02-29T23:38:28Z) - Game-theoretic Objective Space Planning [4.989480853499916]
Understanding intent of other agents is crucial to deploying autonomous systems in adversarial multi-agent environments.
Current approaches either oversimplify the discretization of the action space of agents or fail to recognize the long-term effect of actions and become myopic.
We propose a novel dimension reduction method that encapsulates diverse agent behaviors while conserving the continuity of agent actions.
arXiv Detail & Related papers (2022-09-16T07:35:20Z) - Regret Minimization and Convergence to Equilibria in General-sum Markov
Games [57.568118148036376]
We present the first algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents.
Our algorithm is decentralized, computationally efficient, and does not require any communication between agents.
arXiv Detail & Related papers (2022-07-28T16:27:59Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Cooperative Artificial Intelligence [0.0]
We argue that there is a need for research on the intersection between game theory and artificial intelligence.
We discuss the problem of how an external agent can promote cooperation between artificial learners.
We show that the resulting cooperative outcome is stable in certain games even if the planning agent is turned off.
arXiv Detail & Related papers (2022-02-20T16:50:37Z) - End-to-End Learning and Intervention in Games [60.41921763076017]
We provide a unified framework for learning and intervention in games.
We propose two approaches, respectively based on explicit and implicit differentiation.
The analytical results are validated using several real-world problems.
arXiv Detail & Related papers (2020-10-26T18:39:32Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Learning to Model Opponent Learning [11.61673411387596]
Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment.
This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment.
We develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL)
arXiv Detail & Related papers (2020-06-06T17:19:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.