Related papers: Analysing the Sample Complexity of Opponent Shaping

Analysing the Sample Complexity of Opponent Shaping

URL: http://arxiv.org/abs/2402.05782v1
Date: Thu, 8 Feb 2024 16:17:18 GMT
Title: Analysing the Sample Complexity of Opponent Shaping
Authors: Kitty Fung, Qizhen Zhang, Chris Lu, Jia Wan, Timon Willi, Jakob Foerster
Abstract summary: Learning in general-sum games often yields collectively sub-optimal results. Early opponent shaping (OS) methods use higher-order derivatives to shape the learning of co-players. Model-free Opponent Shaping (M-FOS) addresses these by reframing the OS problem as a meta-game.
Score: 15.226375898939205
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning in general-sum games often yields collectively sub-optimal results. Addressing this, opponent shaping (OS) methods actively guide the learning processes of other agents, empirically leading to improved individual and group performances in many settings. Early OS methods use higher-order derivatives to shape the learning of co-players, making them unsuitable for shaping multiple learning steps. Follow-up work, Model-free Opponent Shaping (M-FOS), addresses these by reframing the OS problem as a meta-game. In contrast to early OS methods, there is little theoretical understanding of the M-FOS framework. Providing theoretical guarantees for M-FOS is hard because A) there is little literature on theoretical sample complexity bounds for meta-reinforcement learning B) M-FOS operates in continuous state and action spaces, so theoretical analysis is challenging. In this work, we present R-FOS, a tabular version of M-FOS that is more suitable for theoretical analysis. R-FOS discretises the continuous meta-game MDP into a tabular MDP. Within this discretised MDP, we adapt the $R_{max}$ algorithm, most prominently used to derive PAC-bounds for MDPs, as the meta-learner in the R-FOS algorithm. We derive a sample complexity bound that is exponential in the cardinality of the inner state and action space and the number of agents. Our bound guarantees that, with high probability, the final policy learned by an R-FOS agent is close to the optimal policy, apart from a constant factor. Finally, we investigate how R-FOS's sample complexity scales in the size of state-action space. Our theoretical results on scaling are supported empirically in the Matching Pennies environment.

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Deep Learning Algorithms for Mean Field Optimal Stopping in Finite Space and Discrete Time [3.350071725971209]
This work studies the mean field optimal stopping (MFOS) problem, obtained as the number of agents approaches infinity. We propose two deep learning methods: one simulates full trajectories to learn optimal decisions, whereas the other leverages DPP with backward induction. We demonstrate the effectiveness of these approaches through numerical experiments on 6 different problems in spatial dimension up to 300.
arXiv Detail & Related papers (2024-10-11T14:27:17Z)
Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs) In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS) For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z)
Federated Empirical Risk Minimization via Second-Order Method [18.548661105227488]
We present an interior point method (IPM) to solve a general empirical risk minimization problem under the federated learning setting. We show that the communication complexity of each iteration of our IPM is $tildeO(d3/2)$, where $d$ is the dimension (i.e., number of features) of the dataset.
arXiv Detail & Related papers (2023-05-27T14:23:14Z)
A multilevel reinforcement learning framework for PDE based control [0.2538209532048867]
Reinforcement learning (RL) is a promising method to solve control problems. Model-free RL algorithms are sample inefficient and require thousands if not millions of samples to learn optimal control policies. We propose a multilevel RL framework in order to ease this cost by exploiting sublevel models that correspond to coarser scale discretization.
arXiv Detail & Related papers (2022-10-15T23:52:48Z)
Model-Free Opponent Shaping [1.433758865948252]
We propose Model-Free Opponent Shaping (M-FOS) for general-sum games. M-FOS learns in a meta-game in which each meta-step is an episode of the underlying ("inner") game. It exploits naive learners and other, more sophisticated algorithms from the literature.
arXiv Detail & Related papers (2022-05-03T12:20:14Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and Personalized Federated Learning [56.17603785248675]
Model-agnostic meta-learning (MAML) has become a popular research area. Existing MAML algorithms rely on the episode' idea by sampling a few tasks and data points to update the meta-model at each iteration. This paper proposes memory-based algorithms for MAML that converge with vanishing error.
arXiv Detail & Related papers (2021-06-09T08:47:58Z)
ConCrete MAP: Learning a Probabilistic Relaxation of Discrete Variables for Soft Estimation with Low Complexity [9.62543698736491]
ConCrete MAP Detection (CMD) is an iterative detection algorithm for large inverse linear problems. We show CMD to feature a promising performance complexity trade-off compared to SotA. Notably, we demonstrate CMD's soft outputs to be reliable for decoders.
arXiv Detail & Related papers (2021-02-25T09:54:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.