Off-Policy Reinforcement Learning with High Dimensional Reward
- URL: http://arxiv.org/abs/2408.07660v1
- Date: Wed, 14 Aug 2024 16:44:56 GMT
- Title: Off-Policy Reinforcement Learning with High Dimensional Reward
- Authors: Dong Neuck Lee, Michael R. Kosorok,
- Abstract summary: Distributional RL (DRL) studies the distribution of returns with the distributional Bellman operator in a Euclidean space.
We prove the contraction property of the Bellman operator even when the reward space is an infinite-dimensional separable Banach space.
We propose a novel DRL algorithm that tackles problems which have been previously intractable using conventional reinforcement learning approaches.
- Score: 1.7297899469367062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional off-policy reinforcement learning (RL) focuses on maximizing the expected return of scalar rewards. Distributional RL (DRL), in contrast, studies the distribution of returns with the distributional Bellman operator in a Euclidean space, leading to highly flexible choices for utility. This paper establishes robust theoretical foundations for DRL. We prove the contraction property of the Bellman operator even when the reward space is an infinite-dimensional separable Banach space. Furthermore, we demonstrate that the behavior of high- or infinite-dimensional returns can be effectively approximated using a lower-dimensional Euclidean space. Leveraging these theoretical insights, we propose a novel DRL algorithm that tackles problems which have been previously intractable using conventional reinforcement learning approaches.
Related papers
- More Benefits of Being Distributional: Second-Order Bounds for
Reinforcement Learning [58.626683114119906]
We show that Distributional Reinforcement Learning (DistRL) can obtain second-order bounds in both online and offline RL.
Our results are the first second-order bounds for low-rank MDPs and for offline RL.
arXiv Detail & Related papers (2024-02-11T13:25:53Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Hyperbolic Deep Reinforcement Learning [8.983647543608226]
We propose a new class of deep reinforcement learning algorithms that model latent representations in hyperbolic space.
We empirically validate our framework by applying it to popular on-policy and off-policy RL algorithms on the Procgen and Atari 100K benchmarks.
arXiv Detail & Related papers (2022-10-04T12:03:04Z) - Policy Gradient for Reinforcement Learning with General Utilities [50.65940899590487]
In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards.
Many supervised and unsupervised RL problems are not covered in the Linear RL framework.
We derive the policy gradient theorem for RL with general utilities.
arXiv Detail & Related papers (2022-10-03T14:57:46Z) - Reinforcement Learning in Factored Action Spaces using Tensor
Decompositions [92.05556163518999]
We propose a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions.
We use cooperative multi-agent reinforcement learning scenario as the exemplary setting.
arXiv Detail & Related papers (2021-10-27T15:49:52Z) - Distributional Reinforcement Learning for Multi-Dimensional Reward
Functions [91.88969237680669]
We introduce Multi-Dimensional Distributional DQN (MD3QN) to model the joint return distribution from multiple reward sources.
As a by-product of joint distribution modeling, MD3QN can capture the randomness in returns for each source of reward.
In experiments, our method accurately models the joint return distribution in environments with richly correlated reward functions.
arXiv Detail & Related papers (2021-10-26T11:24:23Z) - The Benefits of Being Categorical Distributional: Uncertainty-aware
Regularized Exploration in Reinforcement Learning [18.525166928667876]
We attribute the potential superiority of distributional RL to a derived distribution-matching regularization by applying a return density function decomposition technique.
This unexplored regularization in the distributional RL context is aimed at capturing additional return distribution information regardless of only its expectation.
Tests substantiate the importance of this uncertainty-aware regularization in distributional RL on the empirical benefits over classical RL.
arXiv Detail & Related papers (2021-10-07T03:14:46Z) - Cross-Trajectory Representation Learning for Zero-Shot Generalization in
RL [21.550201956884532]
generalize policies learned on a few tasks over a high-dimensional observation space to similar tasks not seen during training.
Many promising approaches to this challenge consider RL as a process of training two functions simultaneously.
We propose Cross-Trajectory Representation Learning (CTRL), a method that runs within an RL agent and conditions its encoder to recognize behavioral similarity in observations.
arXiv Detail & Related papers (2021-06-04T00:43:10Z) - Nested-Wasserstein Self-Imitation Learning for Sequence Generation [158.19606942252284]
We propose the concept of nested-Wasserstein distance for distributional semantic matching.
A novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences.
arXiv Detail & Related papers (2020-01-20T02:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.