Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation
- URL: http://arxiv.org/abs/2203.07322v1
- Date: Mon, 14 Mar 2022 17:24:03 GMT
- Title: Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation
- Authors: Pier Giuseppe Sessa, Maryam Kamgarpour, Andreas Krause
- Abstract summary: H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
- Score: 93.52573037053449
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider model-based multi-agent reinforcement learning, where the
environment transition model is unknown and can only be learned via expensive
interactions with the environment. We propose H-MARL (Hallucinated Multi-Agent
Reinforcement Learning), a novel sample-efficient algorithm that can
efficiently balance exploration, i.e., learning about the environment, and
exploitation, i.e., achieve good equilibrium performance in the underlying
general-sum Markov game. H-MARL builds high-probability confidence intervals
around the unknown transition model and sequentially updates them based on
newly observed data. Using these, it constructs an optimistic hallucinated game
for the agents for which equilibrium policies are computed at each round. We
consider general statistical models (e.g., Gaussian processes, deep ensembles,
etc.) and policy classes (e.g., deep neural networks), and theoretically
analyze our approach by bounding the agents' dynamic regret. Moreover, we
provide a convergence rate to the equilibria of the underlying Markov game. We
demonstrate our approach experimentally on an autonomous driving simulation
benchmark. H-MARL learns successful equilibrium policies after a few
interactions with the environment and can significantly improve the performance
compared to non-exploratory methods.
Related papers
- Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS)
For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium.
We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z) - Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty [40.55653383218379]
This work focuses on learning in distributionally robust Markov games (RMGs)
We propose a sample-efficient model-based algorithm (DRNVI) with finite-sample complexity guarantees for learning robust variants of various notions of game-theoretic equilibria.
arXiv Detail & Related papers (2024-04-29T17:51:47Z) - Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential
Decision-Making in Multi-Agent Reinforcement Learning [17.101534531286298]
We construct a Nash-level policy model based on a conditional hypernetwork shared by all agents.
This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents.
Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios.
arXiv Detail & Related papers (2023-04-20T14:47:54Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent
Reinforcement Learning [15.12491397254381]
We propose an implicit model-based multi-agent reinforcement learning method based on value decomposition methods.
Under this method, agents can interact with the learned virtual environment and evaluate the current state value according to imagined future states.
arXiv Detail & Related papers (2022-04-20T12:16:27Z) - Efficiently Training On-Policy Actor-Critic Networks in Robotic Deep
Reinforcement Learning with Demonstration-like Sampled Exploration [7.930709072852582]
We propose a generic framework for Learning from Demonstration (LfD) based on actor-critic algorithms.
We conduct experiments on 4 standard benchmark environments in Mujoco and 2 self-designed robotic environments.
arXiv Detail & Related papers (2021-09-27T12:42:05Z) - Non-Markovian Reinforcement Learning using Fractional Dynamics [3.000697999889031]
Reinforcement learning (RL) is a technique to learn the control policy for an agent that interacts with an environment.
In this paper, we propose a model-based RL technique for a system that has non-Markovian dynamics.
Such environments are common in many real-world applications such as in human physiology, biological systems, material science, and population dynamics.
arXiv Detail & Related papers (2021-07-29T07:35:13Z) - Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise
Rollouts [52.844741540236285]
This paper investigates the model-based methods in multi-agent reinforcement learning (MARL)
We propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy (AORPO)
arXiv Detail & Related papers (2021-05-07T16:20:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.