Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2210.11942v4
- Date: Thu, 1 Jun 2023 22:51:09 GMT
- Title: Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent
Reinforcement Learning
- Authors: Matthias Gerstgrasser, David C. Parkes
- Abstract summary: We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem.
We discuss how previous approaches can be seen as specific instantiations of this framework.
- Score: 24.284863599920115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stackelberg equilibria arise naturally in a range of popular learning
problems, such as in security games or indirect mechanism design, and have
received increasing attention in the reinforcement learning literature. We
present a general framework for implementing Stackelberg equilibria search as a
multi-agent RL problem, allowing a wide range of algorithmic design choices. We
discuss how previous approaches can be seen as specific instantiations of this
framework. As a key insight, we note that the design space allows for
approaches not previously seen in the literature, for instance by leveraging
multitask and meta-RL techniques for follower convergence. We propose one such
approach using contextual policies, and evaluate it experimentally on both
standard and novel benchmark domains, showing greatly improved sample
efficiency compared to previous approaches. Finally, we explore the effect of
adopting algorithm designs outside the borders of our framework.
Related papers
- Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search [95.06503095273395]
o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research.
We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
arXiv Detail & Related papers (2024-11-18T16:15:17Z) - Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.
Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.
Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z) - Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling [48.78361527873024]
We propose a novel approach to handwriting recognition that integrates the strengths of two distinct methodologies.
We introduce a sparsification technique that accelerates the convergence of the algorithm and enhances the overall system's performance.
arXiv Detail & Related papers (2024-09-09T15:12:28Z) - A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms [7.081523472610874]
We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy.
We empirically evaluate our approach on several classical reinforcement learning tasks.
arXiv Detail & Related papers (2024-06-20T21:50:46Z) - Combinatorial Optimization with Policy Adaptation using Latent Space Search [44.12073954093942]
We present a novel approach for designing performant algorithms to solve complex, typically NP-hard, problems.
We show that our search strategy outperforms state-of-the-art approaches on 11 standard benchmarking tasks.
arXiv Detail & Related papers (2023-11-13T12:24:54Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning.
We propose a novel estimation function with decomposable structural properties for optimization-based exploration.
Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z) - Understanding A Class of Decentralized and Federated Optimization
Algorithms: A Multi-Rate Feedback Control Perspective [41.05789078207364]
We provide a fresh perspective to understand, analyze, and design distributed optimization algorithms.
We show that a wide class of distributed algorithms, including popular decentralized/federated schemes, can be viewed as discretizing a certain continuous-time feedback control system.
arXiv Detail & Related papers (2022-04-27T01:53:57Z) - Multi-agent navigation based on deep reinforcement learning and
traditional pathfinding algorithm [0.0]
We develop a new framework for multi-agent collision avoidance problem.
The framework combined traditional pathfinding algorithm and reinforcement learning.
In our approach, the agents learn whether to be navigated or to take simple actions to avoid their partners.
arXiv Detail & Related papers (2020-12-05T08:56:58Z) - Reinforcement Learning as Iterative and Amortised Inference [62.997667081978825]
We use the control as inference framework to outline a novel classification scheme based on amortised and iterative inference.
We show that taking this perspective allows us to identify parts of the algorithmic design space which have been relatively unexplored.
arXiv Detail & Related papers (2020-06-13T16:10:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.