Related papers: Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning

Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2210.11942v4
Date: Thu, 1 Jun 2023 22:51:09 GMT
Title: Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning
Authors: Matthias Gerstgrasser, David C. Parkes
Abstract summary: We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem. We discuss how previous approaches can be seen as specific instantiations of this framework.
Score: 24.284863599920115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework.

Related papers

Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts [27.62165569135504]
A line of research, known as online clustering of bandits, extends contextual MAB by grouping similar users into clusters. Existing algorithms, which rely on the upper confidence bound (UCB) strategy, struggle to gather adequate statistical information to accurately identify unknown user clusters. We propose two novel algorithms, UniCLUB and PhaseUniCLUB, which incorporate enhanced exploration mechanisms to accelerate cluster identification.
arXiv Detail & Related papers (2025-01-01T16:38:29Z)
Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search [95.06503095273395]
o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research. We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
arXiv Detail & Related papers (2024-11-18T16:15:17Z)
Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation. Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy. Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z)
Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling [48.78361527873024]
We propose a novel approach to handwriting recognition that integrates the strengths of two distinct methodologies. We introduce a sparsification technique that accelerates the convergence of the algorithm and enhances the overall system's performance.
arXiv Detail & Related papers (2024-09-09T15:12:28Z)
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding [76.67608003501479]
We introduce POGEMA, a comprehensive set of tools that includes a fast environment for learning, a problem instance generator, and a visualization toolkit. We also introduce and define an evaluation protocol that specifies a range of domain-related metrics, computed based on primary evaluation indicators. The results of this comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.
arXiv Detail & Related papers (2024-07-20T16:37:21Z)
A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms [7.081523472610874]
We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We empirically evaluate our approach on several classical reinforcement learning tasks.
arXiv Detail & Related papers (2024-06-20T21:50:46Z)
Combinatorial Optimization with Policy Adaptation using Latent Space Search [44.12073954093942]
We present a novel approach for designing performant algorithms to solve complex, typically NP-hard, problems. We show that our search strategy outperforms state-of-the-art approaches on 11 standard benchmarking tasks.
arXiv Detail & Related papers (2023-11-13T12:24:54Z)
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase. We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs. To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)
A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning. We propose a novel estimation function with decomposable structural properties for optimization-based exploration. Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z)
Understanding A Class of Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective [41.05789078207364]
We provide a fresh perspective to understand, analyze, and design distributed optimization algorithms. We show that a wide class of distributed algorithms, including popular decentralized/federated schemes, can be viewed as discretizing a certain continuous-time feedback control system.
arXiv Detail & Related papers (2022-04-27T01:53:57Z)
Multi-agent navigation based on deep reinforcement learning and traditional pathfinding algorithm [0.0]
We develop a new framework for multi-agent collision avoidance problem. The framework combined traditional pathfinding algorithm and reinforcement learning. In our approach, the agents learn whether to be navigated or to take simple actions to avoid their partners.
arXiv Detail & Related papers (2020-12-05T08:56:58Z)
Reinforcement Learning as Iterative and Amortised Inference [62.997667081978825]
We use the control as inference framework to outline a novel classification scheme based on amortised and iterative inference. We show that taking this perspective allows us to identify parts of the algorithmic design space which have been relatively unexplored.
arXiv Detail & Related papers (2020-06-13T16:10:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.