Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2106.00136v1
- Date: Mon, 31 May 2021 23:08:05 GMT
- Title: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning
- Authors: Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh
Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar
- Abstract summary: MARL exacerbates matters by imposing various constraints on communication and observability.
For value-based methods, it poses challenges in accurately representing the optimal value function.
For policy gradient methods, it makes training the critic difficult and exacerbates the problem of the lagging critic.
We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function.
- Score: 92.05556163518999
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning in large action spaces is a challenging problem.
Cooperative multi-agent reinforcement learning (MARL) exacerbates matters by
imposing various constraints on communication and observability. In this work,
we consider the fundamental hurdle affecting both value-based and
policy-gradient approaches: an exponential blowup of the action space with the
number of agents. For value-based methods, it poses challenges in accurately
representing the optimal value function. For policy gradient methods, it makes
training the critic difficult and exacerbates the problem of the lagging
critic. We show that from a learning theory perspective, both problems can be
addressed by accurately representing the associated action-value function with
a low-complexity hypothesis class. This requires accurately modelling the agent
interactions in a sample efficient way. To this end, we propose a novel
tensorised formulation of the Bellman equation. This gives rise to our method
Tesseract, which views the Q-function as a tensor whose modes correspond to the
action spaces of different agents. Algorithms derived from Tesseract decompose
the Q-tensor across agents and utilise low-rank tensor approximations to model
agent interactions relevant to the task. We provide PAC analysis for
Tesseract-based algorithms and highlight their relevance to the class of rich
observation MDPs. Empirical results in different domains confirm Tesseract's
gains in sample efficiency predicted by the theory.
Related papers
- Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning [47.904127007515925]
We study a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction.
We prove that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic approximation guarantees as their counterparts.
Notably, these are the first finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling.
arXiv Detail & Related papers (2023-01-03T04:09:38Z) - Batch Active Learning from the Perspective of Sparse Approximation [12.51958241746014]
Active learning enables efficient model training by leveraging interactions between machine learning agents and human annotators.
We study and propose a novel framework that formulates batch active learning from the sparse approximation's perspective.
Our active learning method aims to find an informative subset from the unlabeled data pool such that the corresponding training loss function approximates its full data pool counterpart.
arXiv Detail & Related papers (2022-11-01T03:20:28Z) - Interaction Pattern Disentangling for Multi-Agent Reinforcement Learning [39.4394389642761]
We introduce a novel interactiOn Pattern disenTangling (OPT) method to disentangle the entity interactions into interaction prototypes.
OPT facilitates filtering the noisy interactions between irrelevant entities and thus significantly improves generalizability as well as interpretability.
Experiments on single-task, multi-task and zero-shot benchmarks demonstrate that the proposed method yields results superior to the state-of-the-art counterparts.
arXiv Detail & Related papers (2022-07-08T13:42:54Z) - Low-rank Optimal Transport: Approximation, Statistics and Debiasing [51.50788603386766]
Low-rank optimal transport (LOT) approach advocated in citescetbon 2021lowrank
LOT is seen as a legitimate contender to entropic regularization when compared on properties of interest.
We target each of these areas in this paper in order to cement the impact of low-rank approaches in computational OT.
arXiv Detail & Related papers (2022-05-24T20:51:37Z) - Model based Multi-agent Reinforcement Learning with Tensor
Decompositions [52.575433758866936]
This paper investigates generalisation in state-action space over unexplored state-action pairs by modelling the transition and reward functions as tensors of low CP-rank.
Experiments on synthetic MDPs show that using tensor decompositions in a model-based reinforcement learning algorithm can lead to much faster convergence if the true transition and reward functions are indeed of low rank.
arXiv Detail & Related papers (2021-10-27T15:36:25Z) - ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via
Convex Relaxation [32.091346776897744]
Cyber-physical attacks can challenge the robustness of multiagent reinforcement learning.
We propose a minimax MARL approach to infer the worst-case policy update of other agents.
arXiv Detail & Related papers (2021-09-14T16:18:35Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z) - Represented Value Function Approach for Large Scale Multi Agent
Reinforcement Learning [0.30458514384586394]
We study the representation problem of the pairwise value function to reduce the complexity of the interactions among agents.
We adopt a l2-norm trick to ensure the trivial term of the approximated value function is bounded.
arXiv Detail & Related papers (2020-01-04T16:29:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.