Related papers: Model-Free Opponent Shaping

Model-Free Opponent Shaping

URL: http://arxiv.org/abs/2205.01447v1
Date: Tue, 3 May 2022 12:20:14 GMT
Title: Model-Free Opponent Shaping
Authors: Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster
Abstract summary: We propose Model-Free Opponent Shaping (M-FOS) for general-sum games. M-FOS learns in a meta-game in which each meta-step is an episode of the underlying ("inner") game. It exploits naive learners and other, more sophisticated algorithms from the literature.
Score: 1.433758865948252
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), shape their opponents' learning process. However, these methods are myopic since only a small number of steps can be anticipated, are asymmetric since they treat other agents as naive learners, and require the use of higher-order derivatives, which are calculated through white-box access to an opponent's differentiable learning algorithm. To address these issues, we propose Model-Free Opponent Shaping (M-FOS). M-FOS learns in a meta-game in which each meta-step is an episode of the underlying ("inner") game. The meta-state consists of the inner policies, and the meta-policy produces a new inner policy to be used in the next episode. M-FOS then uses generic model-free optimisation methods to learn meta-policies that accomplish long-horizon opponent shaping. Empirically, M-FOS near-optimally exploits naive learners and other, more sophisticated algorithms from the literature. For example, to the best of our knowledge, it is the first method to learn the well-known Zero-Determinant (ZD) extortion strategy in the IPD. In the same settings, M-FOS leads to socially optimal outcomes under meta-self-play. Finally, we show that M-FOS can be scaled to high-dimensional settings.

Related papers

How Should We Meta-Learn Reinforcement Learning Algorithms? [74.37180723338591]
We carry out an empirical comparison of the different approaches when applied to a range of meta-learned algorithms.<n>In addition to meta-train and meta-test performance, we also investigate factors including the interpretability, sample cost and train time.<n>We propose several guidelines for meta-learning new RL algorithms which will help ensure that future learned algorithms are as performant as possible.
arXiv Detail & Related papers (2025-07-23T16:31:38Z)
Fast Adaptation with Kernel and Gradient based Meta Leaning [4.763682200721131]
We propose two algorithms to improve both the inner and outer loops of Model A Meta Learning (MAML) Our first algorithm redefines the optimization problem in the function space to update the model using closed-form solutions. In the outer loop, the second algorithm adjusts the learning of the meta-learner by assigning weights to the losses from each task of the inner loop.
arXiv Detail & Related papers (2024-11-01T07:05:03Z)
Analysing the Sample Complexity of Opponent Shaping [15.226375898939205]
Learning in general-sum games often yields collectively sub-optimal results. Early opponent shaping (OS) methods use higher-order derivatives to shape the learning of co-players. Model-free Opponent Shaping (M-FOS) addresses these by reframing the OS problem as a meta-game.
arXiv Detail & Related papers (2024-02-08T16:17:18Z)
Scaling Opponent Shaping to High Dimensional Games [17.27358464280679]
We develop an OS-based approach to general-sum games with temporally-extended actions and long-time horizons. We show that Shaper leads to improved individual and collective outcomes in a range of challenging settings from literature.
arXiv Detail & Related papers (2023-12-19T20:05:23Z)
Context-Aware Meta-Learning [52.09326317432577]
We propose a meta-learning algorithm that emulates Large Language Models by learning new visual concepts during inference without fine-tuning. Our approach exceeds or matches the state-of-the-art algorithm, P>M>F, on 8 out of 11 meta-learning benchmarks.
arXiv Detail & Related papers (2023-10-17T03:35:27Z)
Meta-Value Learning: a General Framework for Learning with Learning Awareness [1.4323566945483497]
We propose to judge joint policies by their long-term prospects as measured by the meta-value. We apply a form of Q-learning to the meta-game of optimization, in a way that avoids the need to explicitly represent the continuous action space of policy updates.
arXiv Detail & Related papers (2023-07-17T21:40:57Z)
Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback. We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z)
Federated Learning and Meta Learning: Approaches, Applications, and Directions [94.68423258028285]
In this tutorial, we present a comprehensive review of FL, meta learning, and federated meta learning (FedMeta) Unlike other tutorial papers, our objective is to explore how FL, meta learning, and FedMeta methodologies can be designed, optimized, and evolved, and their applications over wireless networks.
arXiv Detail & Related papers (2022-10-24T10:59:29Z)
One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning [61.662504399411695]
We introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal. When applied to the Snake game, the mixing meta-gradient algorithm can cut the variance by a factor of 3 while achieving similar or higher performance.
arXiv Detail & Related papers (2021-10-30T08:36:52Z)
Bootstrapped Meta-Learning [48.017607959109924]
We propose an algorithm that tackles a challenging meta-optimisation problem by letting the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric. We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities.
arXiv Detail & Related papers (2021-09-09T18:29:05Z)
Meta-Gradient Reinforcement Learning with an Objective Discovered Online [54.15180335046361]
We propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network. Because the objective is discovered online, it can adapt to changes over time. On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency.
arXiv Detail & Related papers (2020-07-16T16:17:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.