Model-Free Opponent Shaping
- URL: http://arxiv.org/abs/2205.01447v1
- Date: Tue, 3 May 2022 12:20:14 GMT
- Title: Model-Free Opponent Shaping
- Authors: Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster
- Abstract summary: We propose Model-Free Opponent Shaping (M-FOS) for general-sum games.
M-FOS learns in a meta-game in which each meta-step is an episode of the underlying ("inner") game.
It exploits naive learners and other, more sophisticated algorithms from the literature.
- Score: 1.433758865948252
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In general-sum games, the interaction of self-interested learning agents
commonly leads to collectively worst-case outcomes, such as defect-defect in
the iterated prisoner's dilemma (IPD). To overcome this, some methods, such as
Learning with Opponent-Learning Awareness (LOLA), shape their opponents'
learning process. However, these methods are myopic since only a small number
of steps can be anticipated, are asymmetric since they treat other agents as
naive learners, and require the use of higher-order derivatives, which are
calculated through white-box access to an opponent's differentiable learning
algorithm. To address these issues, we propose Model-Free Opponent Shaping
(M-FOS). M-FOS learns in a meta-game in which each meta-step is an episode of
the underlying ("inner") game. The meta-state consists of the inner policies,
and the meta-policy produces a new inner policy to be used in the next episode.
M-FOS then uses generic model-free optimisation methods to learn meta-policies
that accomplish long-horizon opponent shaping. Empirically, M-FOS
near-optimally exploits naive learners and other, more sophisticated algorithms
from the literature. For example, to the best of our knowledge, it is the first
method to learn the well-known Zero-Determinant (ZD) extortion strategy in the
IPD. In the same settings, M-FOS leads to socially optimal outcomes under
meta-self-play. Finally, we show that M-FOS can be scaled to high-dimensional
settings.
Related papers
- Fast Adaptation with Kernel and Gradient based Meta Leaning [4.763682200721131]
We propose two algorithms to improve both the inner and outer loops of Model A Meta Learning (MAML)
Our first algorithm redefines the optimization problem in the function space to update the model using closed-form solutions.
In the outer loop, the second algorithm adjusts the learning of the meta-learner by assigning weights to the losses from each task of the inner loop.
arXiv Detail & Related papers (2024-11-01T07:05:03Z) - Analysing the Sample Complexity of Opponent Shaping [15.226375898939205]
Learning in general-sum games often yields collectively sub-optimal results.
Early opponent shaping (OS) methods use higher-order derivatives to shape the learning of co-players.
Model-free Opponent Shaping (M-FOS) addresses these by reframing the OS problem as a meta-game.
arXiv Detail & Related papers (2024-02-08T16:17:18Z) - Scaling Opponent Shaping to High Dimensional Games [17.27358464280679]
We develop an OS-based approach to general-sum games with temporally-extended actions and long-time horizons.
We show that Shaper leads to improved individual and collective outcomes in a range of challenging settings from literature.
arXiv Detail & Related papers (2023-12-19T20:05:23Z) - Context-Aware Meta-Learning [52.09326317432577]
We propose a meta-learning algorithm that emulates Large Language Models by learning new visual concepts during inference without fine-tuning.
Our approach exceeds or matches the state-of-the-art algorithm, P>M>F, on 8 out of 11 meta-learning benchmarks.
arXiv Detail & Related papers (2023-10-17T03:35:27Z) - Meta-Value Learning: a General Framework for Learning with Learning
Awareness [1.4323566945483497]
We propose to judge joint policies by their long-term prospects as measured by the meta-value.
We apply a form of Q-learning to the meta-game of optimization, in a way that avoids the need to explicitly represent the continuous action space of policy updates.
arXiv Detail & Related papers (2023-07-17T21:40:57Z) - Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback.
We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z) - Federated Learning and Meta Learning: Approaches, Applications, and
Directions [94.68423258028285]
In this tutorial, we present a comprehensive review of FL, meta learning, and federated meta learning (FedMeta)
Unlike other tutorial papers, our objective is to explore how FL, meta learning, and FedMeta methodologies can be designed, optimized, and evolved, and their applications over wireless networks.
arXiv Detail & Related papers (2022-10-24T10:59:29Z) - One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient
Reinforcement Learning [61.662504399411695]
We introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal.
When applied to the Snake game, the mixing meta-gradient algorithm can cut the variance by a factor of 3 while achieving similar or higher performance.
arXiv Detail & Related papers (2021-10-30T08:36:52Z) - Bootstrapped Meta-Learning [48.017607959109924]
We propose an algorithm that tackles a challenging meta-optimisation problem by letting the meta-learner teach itself.
The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric.
We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities.
arXiv Detail & Related papers (2021-09-09T18:29:05Z) - Meta-Gradient Reinforcement Learning with an Objective Discovered Online [54.15180335046361]
We propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network.
Because the objective is discovered online, it can adapt to changes over time.
On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency.
arXiv Detail & Related papers (2020-07-16T16:17:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.