Related papers: Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning

Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning

URL: http://arxiv.org/abs/2212.01174v1
Date: Fri, 2 Dec 2022 13:57:53 GMT
Title: Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning
Authors: Jacob Adamczyk, Argenis Arriojas, Stas Tiomkin, Rahul V. Kulkarni
Abstract summary: We develop a general framework for reward shaping and task composition in entropy-regularized RL. We show how the derived relation leads to a general result for reward shaping in entropy-regularized RL. We then generalize this approach to derive an exact relation connecting optimal value functions for the composition of multiple tasks in entropy-regularized RL.
Score: 3.058685580689605
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In reinforcement learning (RL), the ability to utilize prior knowledge from previously solved tasks can allow agents to quickly solve new problems. In some cases, these new problems may be approximately solved by composing the solutions of previously solved primitive tasks (task composition). Otherwise, prior knowledge can be used to adjust the reward function for a new problem, in a way that leaves the optimal policy unchanged but enables quicker learning (reward shaping). In this work, we develop a general framework for reward shaping and task composition in entropy-regularized RL. To do so, we derive an exact relation connecting the optimal soft value functions for two entropy-regularized RL problems with different reward functions and dynamics. We show how the derived relation leads to a general result for reward shaping in entropy-regularized RL. We then generalize this approach to derive an exact relation connecting optimal value functions for the composition of multiple tasks in entropy-regularized RL. We validate these theoretical contributions with experiments showing that reward shaping and task composition lead to faster learning in various settings.

Related papers

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning [89.04776523010409]
This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI.
arXiv Detail & Related papers (2024-05-24T20:30:14Z)
Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning [1.8399318639816038]
We propose prioritized soft Q-decomposition (PSQD) for learning and adapting subtask solutions under lexicographic priorities. PSQD offers the ability to reuse previously learned subtask solutions in a zero-shot composition, followed by an adaptation step. We demonstrate the efficacy of our approach by presenting successful learning, reuse, and adaptation results for both low- and high-dimensional simulated robot control tasks.
arXiv Detail & Related papers (2023-10-03T18:36:21Z)
Bounding the Optimal Value Function in Compositional Reinforcement Learning [2.7998963147546148]
We show that the optimal solution for a composite task can be related to the known primitive task solutions. We also show that the regret of using a zero-shot policy can be bounded for this class of functions.
arXiv Detail & Related papers (2023-03-05T03:06:59Z)
Learning to Optimize for Reinforcement Learning [58.01132862590378]
Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learneds do not work well even in simple RL tasks. Agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training. We show that, although only trained in toy tasks, our learned can generalize unseen complex tasks in Brax.
arXiv Detail & Related papers (2023-02-03T00:11:02Z)
Reinforcement Learning to Solve NP-hard Problems: an Application to the CVRP [0.0]
We evaluate the use of Reinforcement Learning (RL) to solve a classic optimization problem. We compare two of the most promising RL approaches with traditional solving techniques on a set of benchmark instances. We find that despite not returning the best solution, the RL approach has many advantages over traditional solvers.
arXiv Detail & Related papers (2022-01-14T11:16:17Z)
Reversible Action Design for Combinatorial Optimization with Reinforcement Learning [35.50454156611722]
Reinforcement learning (RL) has recently emerged as a new framework to tackle these problems. We propose a general RL framework that not only exhibits state-of-the-art empirical performance but also generalizes to a variety class of COPs.
arXiv Detail & Related papers (2021-02-14T18:05:42Z)
Weighted Entropy Modification for Soft Actor-Critic [95.37322316673617]
We generalize the existing principle of the maximum Shannon entropy in reinforcement learning (RL) to weighted entropy by characterizing the state-action pairs with some qualitative weights. We propose an algorithm motivated for self-balancing exploration with the introduced weight function, which leads to state-of-the-art performance on Mujoco tasks despite its simplicity in implementation.
arXiv Detail & Related papers (2020-11-18T04:36:03Z)
Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents. We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks. This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks [57.17673320237597]
In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation. This paper presents the first reward shaping framework for average-reward learning. It proves that, under standard assumptions, the optimal policy under the original reward function can be recovered.
arXiv Detail & Related papers (2020-07-03T05:06:57Z)
Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement [137.29281352505245]
We show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks. Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings.
arXiv Detail & Related papers (2020-02-25T18:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.