CTDS: Centralized Teacher with Decentralized Student for Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2203.08412v1
- Date: Wed, 16 Mar 2022 06:03:14 GMT
- Title: CTDS: Centralized Teacher with Decentralized Student for Multi-Agent
Reinforcement Learning
- Authors: Jian Zhao, Xunhan Hu, Mingyu Yang, Wengang Zhou, Jiangcheng Zhu and
Houqiang Li
- Abstract summary: This work proposes a novel.
Teacher with Decentralized Student (C TDS) framework, which consists of a teacher model and a student model.
Specifically, the teacher model allocates the team reward by learning individual Q-values conditioned on global observation.
The student model utilizes the partial observations to approximate the Q-values estimated by the teacher model.
- Score: 114.69155066932046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the partial observability and communication constraints in many
multi-agent reinforcement learning (MARL) tasks, centralized training with
decentralized execution (CTDE) has become one of the most widely used MARL
paradigms. In CTDE, centralized information is dedicated to learning the
allocation of the team reward with a mixing network, while the learning of
individual Q-values is usually based on local observations. The insufficient
utility of global observation will degrade performance in challenging
environments. To this end, this work proposes a novel Centralized Teacher with
Decentralized Student (CTDS) framework, which consists of a teacher model and a
student model. Specifically, the teacher model allocates the team reward by
learning individual Q-values conditioned on global observation, while the
student model utilizes the partial observations to approximate the Q-values
estimated by the teacher model. In this way, CTDS balances the full utilization
of global observation during training and the feasibility of decentralized
execution for online inference. Our CTDS framework is generic which is ready to
be applied upon existing CTDE methods to boost their performance. We conduct
experiments on a challenging set of StarCraft II micromanagement tasks to test
the effectiveness of our method and the results show that CTDS outperforms the
existing value-based MARL methods.
Related papers
- DMT: Comprehensive Distillation with Multiple Self-supervised Teachers [27.037140667247208]
We introduce Comprehensive Distillation with Multiple Self-supervised Teachers (DMT) for pretrained model compression.
Our experimental results on prominent benchmark datasets exhibit that the proposed method significantly surpasses state-of-the-art competitors.
arXiv Detail & Related papers (2023-12-19T08:31:30Z) - HarmonyDream: Task Harmonization Inside World Models [93.07314830304193]
Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning.
We propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization.
arXiv Detail & Related papers (2023-09-30T11:38:13Z) - KDSM: An uplift modeling framework based on knowledge distillation and
sample matching [2.036924568983982]
Uplift modeling aims to estimate the treatment effect on individuals.
Tree-based methods are adept at fitting increment and generalization, while neural-network-based models excel at predicting absolute value and precision.
In this paper, we proposed an uplift modeling framework based on Knowledge Distillation and Sample Matching (KDSM)
arXiv Detail & Related papers (2023-03-06T09:15:28Z) - From Mimicking to Integrating: Knowledge Integration for Pre-Trained
Language Models [55.137869702763375]
This paper explores a novel PLM reuse paradigm, Knowledge Integration (KI)
KI aims to merge the knowledge from different teacher-PLMs, each of which specializes in a different classification problem, into a versatile student model.
We then design a Model Uncertainty--aware Knowledge Integration (MUKI) framework to recover the golden supervision for the student.
arXiv Detail & Related papers (2022-10-11T07:59:08Z) - EBMs vs. CL: Exploring Self-Supervised Visual Pretraining for Visual
Question Answering [53.40635559899501]
clean and diverse labeled data is a major roadblock for training models on complex tasks such as visual question answering (VQA)
We review and evaluate self-supervised methods to leverage unlabeled images and pretrain a model, which we then fine-tune on a custom VQA task.
We find that both EBMs and CL can learn representations from unlabeled images that enable training a VQA model on very little annotated data.
arXiv Detail & Related papers (2022-06-29T01:44:23Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - Centralizing State-Values in Dueling Networks for Multi-Robot
Reinforcement Learning Mapless Navigation [87.85646257351212]
We study the problem of multi-robot mapless navigation in the popular Training and Decentralized Execution (CTDE) paradigm.
This problem is challenging when each robot considers its path without explicitly sharing observations with other robots.
We propose a novel architecture for CTDE that uses a centralized state-value network to compute a joint state-value.
arXiv Detail & Related papers (2021-12-16T16:47:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.