Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks
- URL: http://arxiv.org/abs/2007.01498v1
- Date: Fri, 3 Jul 2020 05:06:57 GMT
- Title: Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks
- Authors: Yuqian Jiang, Sudarshanan Bharadwaj, Bo Wu, Rishi Shah, Ufuk Topcu,
Peter Stone
- Abstract summary: In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation.
This paper presents the first reward shaping framework for average-reward learning.
It proves that, under standard assumptions, the optimal policy under the original reward function can be recovered.
- Score: 57.17673320237597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In continuing tasks, average-reward reinforcement learning may be a more
appropriate problem formulation than the more common discounted reward
formulation. As usual, learning an optimal policy in this setting typically
requires a large amount of training experiences. Reward shaping is a common
approach for incorporating domain knowledge into reinforcement learning in
order to speed up convergence to an optimal policy. However, to the best of our
knowledge, the theoretical properties of reward shaping have thus far only been
established in the discounted setting. This paper presents the first reward
shaping framework for average-reward learning and proves that, under standard
assumptions, the optimal policy under the original reward function can be
recovered. In order to avoid the need for manual construction of the shaping
function, we introduce a method for utilizing domain knowledge expressed as a
temporal logic formula. The formula is automatically translated to a shaping
function that provides additional reward throughout the learning process. We
evaluate the proposed method on three continuing tasks. In all cases, shaping
speeds up the average-reward learning rate without any reduction in the
performance of the learned policy compared to relevant baselines.
Related papers
- Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization [55.14484317645865]
We develop a conditional diffusion model to produce exceptional quality prompts for offline reinforcement learning tasks.
We show that the Prompt diffuser is a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
arXiv Detail & Related papers (2024-11-02T07:38:02Z) - Value Enhancement of Reinforcement Learning via Efficient and Robust
Trust Region Optimization [14.028916306297928]
Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy.
We propose a novel value enhancement method to improve the performance of a given initial policy computed by existing state-of-the-art RL algorithms.
arXiv Detail & Related papers (2023-01-05T18:43:40Z) - Temporal Abstractions-Augmented Temporally Contrastive Learning: An
Alternative to the Laplacian in RL [140.12803111221206]
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting.
We propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation.
We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments.
arXiv Detail & Related papers (2022-03-21T22:07:48Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Formalising the Foundations of Discrete Reinforcement Learning in
Isabelle/HOL [0.0]
We focus on the foundations required for dynamic programming and the use of reinforcement learning agents over such processes.
We prove the existence of a universally optimal policy where there is a discounting factor less than one.
Lastly, we prove that the value iteration and the policy algorithms work in finite time, producing an epsilon-optimal and a fully optimal policy respectively.
arXiv Detail & Related papers (2021-12-11T14:38:36Z) - Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping [71.214923471669]
Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL)
In this paper, we consider the problem of adaptively utilizing a given shaping reward function.
Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards.
arXiv Detail & Related papers (2020-11-05T05:34:14Z) - Average Reward Adjusted Discounted Reinforcement Learning:
Near-Blackwell-Optimal Policies for Real-World Applications [0.0]
Reinforcement learning aims at finding the best stationary policy for a given Markov Decision Process.
This paper provides deep theoretical insights to the widely applied standard discounted reinforcement learning framework.
We establish a novel near-Blackwell-optimal reinforcement learning algorithm.
arXiv Detail & Related papers (2020-04-02T08:05:18Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.