Stackelberg Games for Learning Emergent Behaviors During Competitive
Autocurricula
- URL: http://arxiv.org/abs/2305.03735v1
- Date: Thu, 4 May 2023 19:27:35 GMT
- Title: Stackelberg Games for Learning Emergent Behaviors During Competitive
Autocurricula
- Authors: Boling Yang, Liyuan Zheng, Lillian J. Ratliff, Byron Boots, Joshua R.
Smith
- Abstract summary: This paper proposes a novel game-theoretic algorithm, Stackelberg Multi-Agent Deep Deterministic Policy Gradient (ST-MADDPG)
It formulates a two-player MARL problem as a Stackelberg game with one player as the leader' and the other as the follower' in a hierarchical interaction structure wherein the leader has an advantage.
By exploiting the leader's advantage, ST-MADDPG improves the quality of a co-evolution process and results in more sophisticated and complex strategies that work well even against an unseen strong opponent.
- Score: 35.88217121803472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autocurricular training is an important sub-area of multi-agent reinforcement
learning~(MARL) that allows multiple agents to learn emergent skills in an
unsupervised co-evolving scheme. The robotics community has experimented
autocurricular training with physically grounded problems, such as robust
control and interactive manipulation tasks. However, the asymmetric nature of
these tasks makes the generation of sophisticated policies challenging. Indeed,
the asymmetry in the environment may implicitly or explicitly provide an
advantage to a subset of agents which could, in turn, lead to a low-quality
equilibrium. This paper proposes a novel game-theoretic algorithm, Stackelberg
Multi-Agent Deep Deterministic Policy Gradient (ST-MADDPG), which formulates a
two-player MARL problem as a Stackelberg game with one player as the `leader'
and the other as the `follower' in a hierarchical interaction structure wherein
the leader has an advantage. We first demonstrate that the leader's advantage
from ST-MADDPG can be used to alleviate the inherent asymmetry in the
environment. By exploiting the leader's advantage, ST-MADDPG improves the
quality of a co-evolution process and results in more sophisticated and complex
strategies that work well even against an unseen strong opponent.
Related papers
- Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential
Decision-Making in Multi-Agent Reinforcement Learning [17.101534531286298]
We construct a Nash-level policy model based on a conditional hypernetwork shared by all agents.
This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents.
Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios.
arXiv Detail & Related papers (2023-04-20T14:47:54Z) - MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement
Learning [22.28076947612619]
We introduce Multi-Agent Environment Design Strategist for Open-Ended Learning (MAESTRO)
MAESTRO is the first multi-agent UED approach for two-player zero-sum settings.
Our experiments show that MAESTRO outperforms a number of strong baselines on competitive two-player games.
arXiv Detail & Related papers (2023-03-06T18:57:41Z) - Towards Skilled Population Curriculum for Multi-Agent Reinforcement
Learning [42.540853953923495]
We introduce a novel automatic curriculum learning framework, Skilled Population Curriculum (SPC), which adapts curriculum learning to multi-agent coordination.
Specifically, we endow the student with population-invariant communication and a hierarchical skill set, allowing it to learn cooperation and behavior skills from distinct tasks with varying numbers of agents.
We also analyze the inherent non-stationarity of this multi-agent automatic curriculum teaching problem and provide a corresponding regret bound.
arXiv Detail & Related papers (2023-02-07T12:30:52Z) - TransfQMix: Transformers for Leveraging the Graph Structure of
Multi-Agent Reinforcement Learning Problems [0.0]
We present TransfQMix, a new approach that uses transformers to leverage a latent graph structure and learn better coordination policies.
Our transformer Q-mixer learns a monotonic mixing-function from a larger graph that includes the internal and external states of the agents.
We report TransfQMix's performances in the Spread and StarCraft II environments.
arXiv Detail & Related papers (2023-01-13T00:07:08Z) - MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent
Reinforcement Learning [63.46052494151171]
We propose textitmulti-agent alternate Q-learning (MA2QL), where agents take turns to update their Q-functions by Q-learning.
We prove that when each agent guarantees a $varepsilon$-convergence at each turn, their joint policy converges to a Nash equilibrium.
Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite such minimal changes.
arXiv Detail & Related papers (2022-09-17T04:54:32Z) - It Takes Four to Tango: Multiagent Selfplay for Automatic Curriculum
Generation [107.10235120286352]
Training general-purpose reinforcement learning agents efficiently requires automatic generation of a goal curriculum.
We propose Curriculum Self Play (CuSP), an automated goal generation framework.
We demonstrate that our method succeeds at generating an effective curricula of goals for a range of control tasks.
arXiv Detail & Related papers (2022-02-22T01:23:23Z) - Decentralized Q-Learning in Zero-sum Markov Games [33.81574774144886]
We study multi-agent reinforcement learning (MARL) in discounted zero-sum Markov games.
We develop for the first time a radically uncoupled Q-learning dynamics that is both rational and convergent.
The key challenge in this decentralized setting is the non-stationarity of the learning environment from an agent's perspective.
arXiv Detail & Related papers (2021-06-04T22:42:56Z) - Adversarial Training as Stackelberg Game: An Unrolled Optimization
Approach [91.74682538906691]
Adversarial training has been shown to improve the generalization performance of deep learning models.
We propose Stackelberg Adversarial Training (SALT), which formulates adversarial training as a Stackelberg game.
arXiv Detail & Related papers (2021-04-11T00:44:57Z) - On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents.
We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.