Analysing the Sample Complexity of Opponent Shaping
- URL: http://arxiv.org/abs/2402.05782v1
- Date: Thu, 8 Feb 2024 16:17:18 GMT
- Title: Analysing the Sample Complexity of Opponent Shaping
- Authors: Kitty Fung, Qizhen Zhang, Chris Lu, Jia Wan, Timon Willi, Jakob
Foerster
- Abstract summary: Learning in general-sum games often yields collectively sub-optimal results.
Early opponent shaping (OS) methods use higher-order derivatives to shape the learning of co-players.
Model-free Opponent Shaping (M-FOS) addresses these by reframing the OS problem as a meta-game.
- Score: 15.226375898939205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning in general-sum games often yields collectively sub-optimal results.
Addressing this, opponent shaping (OS) methods actively guide the learning
processes of other agents, empirically leading to improved individual and group
performances in many settings. Early OS methods use higher-order derivatives to
shape the learning of co-players, making them unsuitable for shaping multiple
learning steps. Follow-up work, Model-free Opponent Shaping (M-FOS), addresses
these by reframing the OS problem as a meta-game. In contrast to early OS
methods, there is little theoretical understanding of the M-FOS framework.
Providing theoretical guarantees for M-FOS is hard because A) there is little
literature on theoretical sample complexity bounds for meta-reinforcement
learning B) M-FOS operates in continuous state and action spaces, so
theoretical analysis is challenging. In this work, we present R-FOS, a tabular
version of M-FOS that is more suitable for theoretical analysis. R-FOS
discretises the continuous meta-game MDP into a tabular MDP. Within this
discretised MDP, we adapt the $R_{max}$ algorithm, most prominently used to
derive PAC-bounds for MDPs, as the meta-learner in the R-FOS algorithm. We
derive a sample complexity bound that is exponential in the cardinality of the
inner state and action space and the number of agents. Our bound guarantees
that, with high probability, the final policy learned by an R-FOS agent is
close to the optimal policy, apart from a constant factor. Finally, we
investigate how R-FOS's sample complexity scales in the size of state-action
space. Our theoretical results on scaling are supported empirically in the
Matching Pennies environment.
Related papers
- Deep Learning Algorithms for Mean Field Optimal Stopping in Finite Space and Discrete Time [3.350071725971209]
This work studies the mean field optimal stopping (MFOS) problem, obtained as the number of agents approaches infinity.
We propose two deep learning methods: one simulates full trajectories to learn optimal decisions, whereas the other leverages DPP with backward induction.
We demonstrate the effectiveness of these approaches through numerical experiments on 6 different problems in spatial dimension up to 300.
arXiv Detail & Related papers (2024-10-11T14:27:17Z) - Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs)
In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS)
For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium.
We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z) - Federated Empirical Risk Minimization via Second-Order Method [18.548661105227488]
We present an interior point method (IPM) to solve a general empirical risk minimization problem under the federated learning setting.
We show that the communication complexity of each iteration of our IPM is $tildeO(d3/2)$, where $d$ is the dimension (i.e., number of features) of the dataset.
arXiv Detail & Related papers (2023-05-27T14:23:14Z) - A multilevel reinforcement learning framework for PDE based control [0.2538209532048867]
Reinforcement learning (RL) is a promising method to solve control problems.
Model-free RL algorithms are sample inefficient and require thousands if not millions of samples to learn optimal control policies.
We propose a multilevel RL framework in order to ease this cost by exploiting sublevel models that correspond to coarser scale discretization.
arXiv Detail & Related papers (2022-10-15T23:52:48Z) - Model-Free Opponent Shaping [1.433758865948252]
We propose Model-Free Opponent Shaping (M-FOS) for general-sum games.
M-FOS learns in a meta-game in which each meta-step is an episode of the underlying ("inner") game.
It exploits naive learners and other, more sophisticated algorithms from the literature.
arXiv Detail & Related papers (2022-05-03T12:20:14Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and
Personalized Federated Learning [56.17603785248675]
Model-agnostic meta-learning (MAML) has become a popular research area.
Existing MAML algorithms rely on the episode' idea by sampling a few tasks and data points to update the meta-model at each iteration.
This paper proposes memory-based algorithms for MAML that converge with vanishing error.
arXiv Detail & Related papers (2021-06-09T08:47:58Z) - ConCrete MAP: Learning a Probabilistic Relaxation of Discrete Variables
for Soft Estimation with Low Complexity [9.62543698736491]
ConCrete MAP Detection (CMD) is an iterative detection algorithm for large inverse linear problems.
We show CMD to feature a promising performance complexity trade-off compared to SotA.
Notably, we demonstrate CMD's soft outputs to be reliable for decoders.
arXiv Detail & Related papers (2021-02-25T09:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.