Regret Bounds for Decentralized Learning in Cooperative Multi-Agent
Dynamical Systems
- URL: http://arxiv.org/abs/2001.10122v1
- Date: Mon, 27 Jan 2020 23:37:41 GMT
- Title: Regret Bounds for Decentralized Learning in Cooperative Multi-Agent
Dynamical Systems
- Authors: Seyed Mohammad Asghari, Yi Ouyang, and Ashutosh Nayyar
- Abstract summary: quadratic analysis is challenging in Multi-Agent Reinforcement Learning (MARL)
We propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem.
We show that our algorithm provides a $tildeO(sqrtT)$ regret bound.
- Score: 3.9599054392856488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL)
primarily due to the dynamical environments and the decentralized information
among agents. We attempt to solve this challenge in the context of
decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems.
We begin with a simple setup consisting of two agents and two dynamically
decoupled stochastic linear systems, each system controlled by an agent. The
systems are coupled through a quadratic cost function. When both systems'
dynamics are unknown and there is no communication among the agents, we show
that no learning policy can generate sub-linear in $T$ regret, where $T$ is the
time horizon. When only one system's dynamics are unknown and there is
one-directional communication from the agent controlling the unknown system to
the other agent, we propose a MARL algorithm based on the construction of an
auxiliary single-agent LQ problem. The auxiliary single-agent problem in the
proposed MARL algorithm serves as an implicit coordination mechanism among the
two learning agents. This allows the agents to achieve a regret within
$O(\sqrt{T})$ of the regret of the auxiliary single-agent problem.
Consequently, using existing results for single-agent LQ regret, our algorithm
provides a $\tilde{O}(\sqrt{T})$ regret bound. (Here $\tilde{O}(\cdot)$ hides
constants and logarithmic factors). Our numerical experiments indicate that
this bound is matched in practice. From the two-agent problem, we extend our
results to multi-agent LQ systems with certain communication patterns.
Related papers
- On the Resilience of Multi-Agent Systems with Malicious Agents [58.79302663733702]
This paper investigates what is the resilience of multi-agent system structures under malicious agents.
We devise two methods, AutoTransform and AutoInject, to transform any agent into a malicious one.
We show that two defense methods, introducing a mechanism for each agent to challenge others' outputs, or an additional agent to review and correct messages, can enhance system resilience.
arXiv Detail & Related papers (2024-08-02T03:25:20Z) - Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - Impact of Decentralized Learning on Player Utilities in Stackelberg Games [57.08270857260131]
In many two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned.
We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks result in worst-case linear regret for at least one player.
We develop algorithms to achieve near-optimal $O(T2/3)$ regret for both players with respect to these benchmarks.
arXiv Detail & Related papers (2024-02-29T23:38:28Z) - A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration [55.35849138235116]
We propose automatically selecting a team of agents from candidates to collaborate in a dynamic communication structure toward different tasks and domains.
Specifically, we build a framework named Dynamic LLM-Powered Agent Network ($textDyLAN$) for LLM-powered agent collaboration.
We demonstrate that DyLAN outperforms strong baselines in code generation, decision-making, general reasoning, and arithmetic reasoning tasks with moderate computational cost.
arXiv Detail & Related papers (2023-10-03T16:05:48Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
Diffusion model (DM) recently achieved huge success in various scenarios including offline reinforcement learning.
We propose MADiff, a novel generative multi-agent learning framework to tackle this problem.
Our experiments show the superior performance of MADiff compared to baseline algorithms in a wide range of multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - Decentralized Cooperative Multi-Agent Reinforcement Learning with
Exploration [35.75029940279768]
We study multi-agent reinforcement learning in the most basic cooperative setting -- Markov teams.
We propose an algorithm in which each agent independently runs a stage-based V-learning style algorithm.
We show that the agents can learn an $epsilon$-approximate Nash equilibrium policy in at most $proptowidetildeO (1/epsilon4)$ episodes.
arXiv Detail & Related papers (2021-10-12T02:45:12Z) - Regret Analysis of Distributed Online LQR Control for Unknown LTI
Systems [8.832969171530056]
We study the distributed online linear quadratic regulator (LQR) problem for linear time-invariant (LTI) systems with unknown dynamics.
We propose a distributed variant of the online LQR algorithm where each agent computes its system estimate during an exploration stage.
We prove that our proposed algorithm scales $tildeO(T2/3)$, implying the consensus of the network over time.
arXiv Detail & Related papers (2021-05-15T23:02:58Z) - Accelerating Distributed Online Meta-Learning via Multi-Agent
Collaboration under Limited Communication [24.647993999787992]
We propose a multi-agent online meta-learning framework and cast it as an equivalent two-level nested online convex optimization (OCO) problem.
By characterizing the upper bound of the agent-task-averaged regret, we show that the performance of multi-agent online meta-learning depends heavily on how much an agent can benefit from the distributed network-level OCO for meta-model updates via limited communication.
We show that a factor of $sqrt1/N$ speedup over the optimal single-agent regret $O(sqrtT)$ after $
arXiv Detail & Related papers (2020-12-15T23:08:36Z) - Thompson sampling for linear quadratic mean-field teams [3.957353452014781]
We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents.
We propose a new Thompson sampling based learning algorithm which exploits the structure of the system model and show that the expected Bayesian regret of our proposed algorithm for a system with agents of different types at time horizon $T$ is $tildemathcalO big( |M|1.5 sqrtT big)$ irrespective of the total number of agents.
arXiv Detail & Related papers (2020-11-09T19:07:32Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.