Distributed Q-Learning with State Tracking for Multi-agent Networked
Control
- URL: http://arxiv.org/abs/2012.12383v1
- Date: Tue, 22 Dec 2020 22:03:49 GMT
- Title: Distributed Q-Learning with State Tracking for Multi-agent Networked
Control
- Authors: Hang Wang, Sen Lin, Hamid Jafarkhani, Junshan Zhang
- Abstract summary: This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network.
We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents.
- Score: 61.63442612938345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies distributed Q-learning for Linear Quadratic Regulator
(LQR) in a multi-agent network. The existing results often assume that agents
can observe the global system state, which may be infeasible in large-scale
systems due to privacy concerns or communication constraints. In this work, we
consider a setting with unknown system models and no centralized coordinator.
We devise a state tracking (ST) based Q-learning algorithm to design optimal
controllers for agents. Specifically, we assume that agents maintain local
estimates of the global state based on their local information and
communications with neighbors. At each step, every agent updates its local
global state estimation, based on which it solves an approximate Q-factor
locally through policy iteration. Assuming decaying injected excitation noise
during the policy evaluation, we prove that the local estimation converges to
the true global state, and establish the convergence of the proposed
distributed ST-based Q-learning algorithm. The experimental studies corroborate
our theoretical results by showing that our proposed method achieves comparable
performance with the centralized case.
Related papers
- The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup
and Beyond [44.43850105124659]
We consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on local data alone.
We provide sample complexity guarantees for both the synchronous and asynchronous variants of federated Q-learning.
We propose a novel federated Q-learning algorithm with importance averaging, giving larger weights to more frequently visited state-action pairs.
arXiv Detail & Related papers (2023-05-18T04:18:59Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - Policy Evaluation in Decentralized POMDPs with Belief Sharing [39.550233049869036]
We consider a cooperative policy evaluation task in which agents are not assumed to observe the environment state directly.
We propose a fully decentralized belief forming strategy that relies on individual updates and on localized interactions over a communication network.
arXiv Detail & Related papers (2023-02-08T15:54:15Z) - Distributed-Training-and-Execution Multi-Agent Reinforcement Learning
for Power Control in HetNet [48.96004919910818]
We propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet.
To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems.
In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process.
arXiv Detail & Related papers (2022-12-15T17:01:56Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Centralizing State-Values in Dueling Networks for Multi-Robot
Reinforcement Learning Mapless Navigation [87.85646257351212]
We study the problem of multi-robot mapless navigation in the popular Training and Decentralized Execution (CTDE) paradigm.
This problem is challenging when each robot considers its path without explicitly sharing observations with other robots.
We propose a novel architecture for CTDE that uses a centralized state-value network to compute a joint state-value.
arXiv Detail & Related papers (2021-12-16T16:47:00Z) - Dimension-Free Rates for Natural Policy Gradient in Multi-Agent
Reinforcement Learning [22.310861786709538]
We propose a scalable algorithm for cooperative multi-agent reinforcement learning.
We show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity.
arXiv Detail & Related papers (2021-09-23T23:38:15Z) - Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network
Approach [6.802025156985356]
This paper proposes a framework called localized training and decentralized execution to study MARL with network of states.
The key idea is to utilize the homogeneity of agents and regroup them according to their states, thus the formulation of a networked Markov decision process.
arXiv Detail & Related papers (2021-08-05T16:52:36Z) - Multi-Agent Reinforcement Learning in Stochastic Networked Systems [30.78949372661673]
We study multi-agent reinforcement learning (MARL) in a network of agents.
The objective is to find localized policies that maximize the (discounted) global reward.
arXiv Detail & Related papers (2020-06-11T16:08:16Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.