Scaling Opponent Shaping to High Dimensional Games
- URL: http://arxiv.org/abs/2312.12568v3
- Date: Sat, 10 Feb 2024 21:52:17 GMT
- Title: Scaling Opponent Shaping to High Dimensional Games
- Authors: Akbir Khan and Timon Willi and Newton Kwan and Andrea Tacchetti and
Chris Lu and Edward Grefenstette and Tim Rockt\"aschel and Jakob Foerster
- Abstract summary: We develop an OS-based approach to general-sum games with temporally-extended actions and long-time horizons.
We show that Shaper leads to improved individual and collective outcomes in a range of challenging settings from literature.
- Score: 17.27358464280679
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In multi-agent settings with mixed incentives, methods developed for zero-sum
games have been shown to lead to detrimental outcomes. To address this issue,
opponent shaping (OS) methods explicitly learn to influence the learning
dynamics of co-players and empirically lead to improved individual and
collective outcomes. However, OS methods have only been evaluated in
low-dimensional environments due to the challenges associated with estimating
higher-order derivatives or scaling model-free meta-learning. Alternative
methods that scale to more complex settings either converge to undesirable
solutions or rely on unrealistic assumptions about the environment or
co-players. In this paper, we successfully scale an OS-based approach to
general-sum games with temporally-extended actions and long-time horizons for
the first time. After analysing the representations of the meta-state and
history used by previous algorithms, we propose a simplified version called
Shaper. We show empirically that Shaper leads to improved individual and
collective outcomes in a range of challenging settings from literature. We
further formalize a technique previously implicit in the literature, and
analyse its contribution to opponent shaping. We show empirically that this
technique is helpful for the functioning of prior methods in certain
environments. Lastly, we show that previous environments, such as the CoinGame,
are inadequate for analysing temporally-extended general-sum interactions.
Related papers
- Landscape-Aware Growing: The Power of a Little LAG [49.897766925371485]
We study the question of how to select the best growing strategy from a given pool of growing strategies.
We present an alternative perspective based on early training dynamics, which we call "landscape-aware growing (LAG)"
arXiv Detail & Related papers (2024-06-04T16:38:57Z) - Difficulty Modelling in Mobile Puzzle Games: An Empirical Study on
Different Methods to Combine Player Analytics and Simulated Data [0.0]
A common practice consists of creating metrics out of data collected by player interactions with the content.
This allows for estimation only after the content is released and does not consider the characteristics of potential future players.
In this article, we present a number of potential solutions for the estimation of difficulty under such conditions.
arXiv Detail & Related papers (2024-01-30T20:51:42Z) - Leading the Pack: N-player Opponent Shaping [52.682734939786464]
We extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents.
We find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well.
arXiv Detail & Related papers (2023-12-19T20:01:42Z) - United We Stand: Using Epoch-wise Agreement of Ensembles to Combat
Overfit [7.627299398469962]
We introduce a novel ensemble classifier for deep networks that effectively overcomes overfitting.
Our method allows for the incorporation of useful knowledge obtained during the overfitting phase without deterioration of the general performance.
Our method is easy to implement and can be integrated with any training scheme and architecture.
arXiv Detail & Related papers (2023-10-17T08:51:44Z) - Class-Incremental Mixture of Gaussians for Deep Continual Learning [15.49323098362628]
We propose end-to-end incorporation of the mixture of Gaussians model into the continual learning framework.
We show that our model can effectively learn in memory-free scenarios with fixed extractors.
arXiv Detail & Related papers (2023-07-09T04:33:19Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Model-Free Opponent Shaping [1.433758865948252]
We propose Model-Free Opponent Shaping (M-FOS) for general-sum games.
M-FOS learns in a meta-game in which each meta-step is an episode of the underlying ("inner") game.
It exploits naive learners and other, more sophisticated algorithms from the literature.
arXiv Detail & Related papers (2022-05-03T12:20:14Z) - Continual Predictive Learning from Videos [100.27176974654559]
We study a new continual learning problem in the context of video prediction.
We propose the continual predictive learning (CPL) approach, which learns a mixture world model via predictive experience replay.
We construct two new benchmarks based on RoboNet and KTH, in which different tasks correspond to different physical robotic environments or human actions.
arXiv Detail & Related papers (2022-04-12T08:32:26Z) - Single-Layer Vision Transformers for More Accurate Early Exits with Less
Overhead [88.17413955380262]
We introduce a novel architecture for early exiting based on the vision transformer architecture.
We show that our method works for both classification and regression problems.
We also introduce a novel method for integrating audio and visual modalities within early exits in audiovisual data analysis.
arXiv Detail & Related papers (2021-05-19T13:30:34Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.