Model based Multi-agent Reinforcement Learning with Tensor
Decompositions
- URL: http://arxiv.org/abs/2110.14524v1
- Date: Wed, 27 Oct 2021 15:36:25 GMT
- Title: Model based Multi-agent Reinforcement Learning with Tensor
Decompositions
- Authors: Pascal Van Der Vaart, Anuj Mahajan, Shimon Whiteson
- Abstract summary: This paper investigates generalisation in state-action space over unexplored state-action pairs by modelling the transition and reward functions as tensors of low CP-rank.
Experiments on synthetic MDPs show that using tensor decompositions in a model-based reinforcement learning algorithm can lead to much faster convergence if the true transition and reward functions are indeed of low rank.
- Score: 52.575433758866936
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A challenge in multi-agent reinforcement learning is to be able to generalize
over intractable state-action spaces. Inspired from Tesseract [Mahajan et al.,
2021], this position paper investigates generalisation in state-action space
over unexplored state-action pairs by modelling the transition and reward
functions as tensors of low CP-rank. Initial experiments on synthetic MDPs show
that using tensor decompositions in a model-based reinforcement learning
algorithm can lead to much faster convergence if the true transition and reward
functions are indeed of low rank.
Related papers
- Low-Rank Tensor Learning by Generalized Nonconvex Regularization [25.115066273660478]
We study the problem of low-rank tensor learning, where only a few samples are observed the underlying tensor.
A family of non tensor learning functions are employed to characterize the low-rankness of the underlying tensor.
An algorithm designed to solve the resulting majorization-minimization is proposed.
arXiv Detail & Related papers (2024-10-24T03:33:20Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning [47.904127007515925]
We study a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction.
We prove that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic approximation guarantees as their counterparts.
Notably, these are the first finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling.
arXiv Detail & Related papers (2023-01-03T04:09:38Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Reinforcement Learning in Factored Action Spaces using Tensor
Decompositions [92.05556163518999]
We propose a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions.
We use cooperative multi-agent reinforcement learning scenario as the exemplary setting.
arXiv Detail & Related papers (2021-10-27T15:49:52Z) - Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning [92.05556163518999]
MARL exacerbates matters by imposing various constraints on communication and observability.
For value-based methods, it poses challenges in accurately representing the optimal value function.
For policy gradient methods, it makes training the critic difficult and exacerbates the problem of the lagging critic.
We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function.
arXiv Detail & Related papers (2021-05-31T23:08:05Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z) - Multi-mode Core Tensor Factorization based Low-Rankness and Its
Applications to Tensor Completion [0.0]
Low-rank tensor completion is widely used in computer and machine learning.
This paper develops a kind of multi-modal tensorization algorithm (MCTF) together with a low-rankness measure and a better nonspectral relaxation form of it.
arXiv Detail & Related papers (2020-12-03T13:57:00Z) - Towards Flexible Sparsity-Aware Modeling: Automatic Tensor Rank Learning
Using The Generalized Hyperbolic Prior [24.848237413017937]
rank learning for canonical polyadic decomposition (CPD) has long been deemed as an essential yet challenging problem.
The optimal determination of a tensor rank is known to be a non-deterministic-time hard (NP-hard) task.
In this paper, we introduce a more advanced generalized hyperbolic (GH) prior to the probabilistic modeling model, which is more flexible to adapt to different levels of sparsity.
arXiv Detail & Related papers (2020-09-05T06:07:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.