Related papers: Reinforcement Learning in Factored Action Spaces using Tensor Decompositions

Reinforcement Learning in Factored Action Spaces using Tensor Decompositions

URL: http://arxiv.org/abs/2110.14538v1
Date: Wed, 27 Oct 2021 15:49:52 GMT
Title: Reinforcement Learning in Factored Action Spaces using Tensor Decompositions
Authors: Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar
Abstract summary: We propose a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions. We use cooperative multi-agent reinforcement learning scenario as the exemplary setting.
Score: 92.05556163518999
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present an extended abstract for the previously published work TESSERACT [Mahajan et al., 2021], which proposes a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions. The goal of this abstract is twofold: (1) To garner greater interest amongst the tensor research community for creating methods and analysis for approximate RL, (2) To elucidate the generalised setting of factored action spaces where tensor decompositions can be used. We use cooperative multi-agent reinforcement learning scenario as the exemplary setting where the action space is naturally factored across agents and learning becomes intractable without resorting to approximation on the underlying hypothesis space for candidate solutions.

Related papers

Off-Policy Reinforcement Learning with High Dimensional Reward [1.7297899469367062]
Distributional RL (DRL) studies the distribution of returns with the distributional Bellman operator in a Euclidean space. We prove the contraction property of the Bellman operator even when the reward space is an infinite-dimensional separable Banach space. We propose a novel DRL algorithm that tackles problems which have been previously intractable using conventional reinforcement learning approaches.
arXiv Detail & Related papers (2024-08-14T16:44:56Z)
Strongly Isomorphic Neural Optimal Transport Across Incomparable Spaces [7.535219325248997]
We present a novel neural formulation of the Gromov-Monge problem rooted in one of its fundamental properties. We operationalize this property by decomposing the learnable OT map into two components. Our framework provides a promising approach to learn OT maps across diverse spaces.
arXiv Detail & Related papers (2024-07-20T18:27:11Z)
Adaptive trajectory-constrained exploration strategy for deep reinforcement learning [6.589742080994319]
Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces. We propose an efficient adaptive trajectory-constrained exploration strategy for DRL. We conduct experiments on two large 2D grid world mazes and several MuJoCo tasks.
arXiv Detail & Related papers (2023-12-27T07:57:15Z)
Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning [83.41487567765871]
Skipper is a model-based reinforcement learning framework. It automatically generalizes the task given into smaller, more manageable subtasks. It enables sparse decision-making and focused abstractions on the relevant parts of the environment.
arXiv Detail & Related papers (2023-09-30T02:25:18Z)
Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare [38.42691031505782]
We propose a form of linear Q-function decomposition induced by factored action spaces. Our approach can help an agent make more accurate inferences within underexplored regions of the state-action space.
arXiv Detail & Related papers (2023-05-02T19:13:10Z)
Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems. However, RL approaches are intractable in the slate recommendation scenario. In that setting, an action corresponds to a slate that may contain any combination of items. In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder. We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z)
Supervised learning of sheared distributions using linearized optimal transport [64.53761005509386]
In this paper we study supervised learning tasks on the space of probability measures. We approach this problem by embedding the space of probability measures into $L2$ spaces using the optimal transport framework. Regular machine learning techniques are used to achieve linear separability.
arXiv Detail & Related papers (2022-01-25T19:19:59Z)
Model based Multi-agent Reinforcement Learning with Tensor Decompositions [52.575433758866936]
This paper investigates generalisation in state-action space over unexplored state-action pairs by modelling the transition and reward functions as tensors of low CP-rank. Experiments on synthetic MDPs show that using tensor decompositions in a model-based reinforcement learning algorithm can lead to much faster convergence if the true transition and reward functions are indeed of low rank.
arXiv Detail & Related papers (2021-10-27T15:36:25Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.