Related papers: Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems

Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems

URL: http://arxiv.org/abs/2305.07856v1
Date: Sat, 13 May 2023 07:29:31 GMT
Title: Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems
Authors: Bin Zhang, Hangyu Mao, Lijuan Li, Zhiwei Xu, Dapeng Li, Rui Zhao, Guoliang Fan
Abstract summary: AReinforcement action coordination presents a pervasive challenge in Multi-Agent Systems (MAS) We propose the Stackelberg Decision Transformer (STEER), an adaptable approach that resolves the difficulties of hierarchical coordination among agents. Experimental results demonstrate that our method can converge to Stackelberg equilibrium solutions and outperforms other existing methods in complex scenarios.
Score: 19.130281505547064
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Asynchronous action coordination presents a pervasive challenge in Multi-Agent Systems (MAS), which can be represented as a Stackelberg game (SG). However, the scalability of existing Multi-Agent Reinforcement Learning (MARL) methods based on SG is severely constrained by network structures or environmental limitations. To address this issue, we propose the Stackelberg Decision Transformer (STEER), a heuristic approach that resolves the difficulties of hierarchical coordination among agents. STEER efficiently manages decision-making processes in both spatial and temporal contexts by incorporating the hierarchical decision structure of SG, the modeling capability of autoregressive sequence models, and the exploratory learning methodology of MARL. Our research contributes to the development of an effective and adaptable asynchronous action coordination method that can be widely applied to various task types and environmental configurations in MAS. Experimental results demonstrate that our method can converge to Stackelberg equilibrium solutions and outperforms other existing methods in complex scenarios.

Related papers

Adaptability in Multi-Agent Reinforcement Learning: A Framework and Unified Review [9.246912481179464]
Multi-Agent Reinforcement Learning (MARL) has shown clear effectiveness in coordinating multiple agents across simulated benchmarks and constrained scenarios.<n>This survey contributes to the development of algorithms that are better suited for deployment in dynamic, real-world multi-agent systems.
arXiv Detail & Related papers (2025-07-14T10:39:17Z)
Multi-Agent Collaboration via Evolving Orchestration [61.93162413517026]
Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving.<n>We propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a central orchestrator dynamically directs agents in response to evolving task states.<n> Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs.
arXiv Detail & Related papers (2025-05-26T07:02:17Z)
PATS: Process-Level Adaptive Thinking Mode Switching [53.53401063490537]
Current large-language models (LLMs) typically adopt a fixed reasoning strategy, either simple or complex, for all questions, regardless of their difficulty.<n>This neglect of variation in task and reasoning process complexity leads to an imbalance between performance and efficiency.<n>Existing methods attempt to implement training-free fast-slow thinking system switching to handle problems of varying difficulty, but are limited by coarse-grained solution-level strategy adjustments.<n>We propose a novel reasoning paradigm: Process-Level Adaptive Thinking Mode Switching (PATS), which enables LLMs to dynamically adjust their reasoning strategy based on the difficulty of each step, optimizing the balance between
arXiv Detail & Related papers (2025-05-25T17:58:50Z)
Model Evolution Framework with Genetic Algorithm for Multi-Task Reinforcement Learning [85.91908329457081]
Multi-task reinforcement learning employs a single policy to complete various tasks, aiming to develop an agent with generalizability across different scenarios. Existing approaches typically use a routing network to generate specific routes for each task and reconstruct a set of modules into diverse models to complete multiple tasks simultaneously. We propose a Model Evolution framework with Genetic Algorithm (MEGA), which enables the model to evolve during training according to the difficulty of the tasks.
arXiv Detail & Related papers (2025-02-19T09:22:34Z)
A Local Information Aggregation based Multi-Agent Reinforcement Learning for Robot Swarm Dynamic Task Allocation [4.144893164317513]
We introduce a novel framework using a decentralized partially observable Markov decision process (Dec_POMDP) At the core of our methodology is the Local Information Aggregation Multi-Agent Deep Deterministic Policy Gradient (LIA_MADDPG) algorithm. Our empirical evaluations show that the LIA module can be seamlessly integrated into various CTDE-based MARL methods.
arXiv Detail & Related papers (2024-11-29T07:53:05Z)
Cooperative and Asynchronous Transformer-based Mission Planning for Heterogeneous Teams of Mobile Robots [1.1049608786515839]
This paper presents the Cooperative and Asynchronous Transformer-based Mission Planning (CATMiP) framework. CatMiP uses multi-agent reinforcement learning to coordinate agents with heterogeneous sensing, motion, and actuation capabilities. It easily adapts to mission complexities and communication constraints, and scales to varying environment sizes and team compositions.
arXiv Detail & Related papers (2024-10-08T21:14:09Z)
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm. HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies. HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z)
HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [72.25707314772254]
We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task. The upper level of this framework is dedicated to learning a task-specific mask that delineates the harmony subspace, while the inner level focuses on updating parameters to enhance the overall performance of the unified policy.
arXiv Detail & Related papers (2024-05-28T11:41:41Z)
Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration [5.326588461041464]
Multi-agent reinforcement learning (MARL) is transforming fields like autonomous vehicle networks. MARL strategies for different roles can be updated flexibly according to the scales, which is still a challenge for current MARL frameworks. We propose a novel MARL framework named Scalable and Heterogeneous Proximal Policy Optimization (SHPPO) We show SHPPO exhibits superior performance in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF)
arXiv Detail & Related papers (2024-04-05T03:02:57Z)
Distributionally Robust Model-based Reinforcement Learning with Large State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z)
Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential Decision-Making in Multi-Agent Reinforcement Learning [17.101534531286298]
We construct a Nash-level policy model based on a conditional hypernetwork shared by all agents. This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents. Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios.
arXiv Detail & Related papers (2023-04-20T14:47:54Z)
Revisiting GANs by Best-Response Constraint: Perspective, Methodology, and Application [49.66088514485446]
Best-Response Constraint (BRC) is a general learning framework to explicitly formulate the potential dependency of the generator on the discriminator. We show that even with different motivations and formulations, a variety of existing GANs ALL can be uniformly improved by our flexible BRC methodology.
arXiv Detail & Related papers (2022-05-20T12:42:41Z)
A Hybrid Evolutionary Algorithm for Reliable Facility Location Problem [10.668347198815438]
The reliable facility location problem (RFLP) plays a vital role in the decision-making and management of modern supply chain and logistics. We propose a novel model for the RFLP. Instead of assuming allocating a fixed number of facilities to each customer as in the existing works, we set the number of allocated facilities as an independent variable. To handle it, we propose EAMLS, a hybrid evolutionary algorithm, which combines a memorable local search (MLS) method and an evolutionary algorithm (EA)
arXiv Detail & Related papers (2020-06-27T11:31:55Z)
Planning in Markov Decision Processes with Gap-Dependent Sample Complexity [48.98199700043158]
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability.
arXiv Detail & Related papers (2020-06-10T15:05:51Z)
Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search. We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.