Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks
- URL: http://arxiv.org/abs/2007.01498v1
- Date: Fri, 3 Jul 2020 05:06:57 GMT
- Title: Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks
- Authors: Yuqian Jiang, Sudarshanan Bharadwaj, Bo Wu, Rishi Shah, Ufuk Topcu,
Peter Stone
- Abstract summary: In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation.
This paper presents the first reward shaping framework for average-reward learning.
It proves that, under standard assumptions, the optimal policy under the original reward function can be recovered.
- Score: 57.17673320237597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In continuing tasks, average-reward reinforcement learning may be a more
appropriate problem formulation than the more common discounted reward
formulation. As usual, learning an optimal policy in this setting typically
requires a large amount of training experiences. Reward shaping is a common
approach for incorporating domain knowledge into reinforcement learning in
order to speed up convergence to an optimal policy. However, to the best of our
knowledge, the theoretical properties of reward shaping have thus far only been
established in the discounted setting. This paper presents the first reward
shaping framework for average-reward learning and proves that, under standard
assumptions, the optimal policy under the original reward function can be
recovered. In order to avoid the need for manual construction of the shaping
function, we introduce a method for utilizing domain knowledge expressed as a
temporal logic formula. The formula is automatically translated to a shaping
function that provides additional reward throughout the learning process. We
evaluate the proposed method on three continuing tasks. In all cases, shaping
speeds up the average-reward learning rate without any reduction in the
performance of the learned policy compared to relevant baselines.
Related papers
- Improving the Effectiveness of Potential-Based Reward Shaping in Reinforcement Learning [0.5524804393257919]
We show how a simple linear shift of the potential function can be used to improve the effectiveness of reward shaping.
We show the theoretical limitations of continuous potential functions for correctly assigning positive and negative reward shaping values.
arXiv Detail & Related papers (2025-02-03T12:32:50Z) - Online inductive learning from answer sets for efficient reinforcement learning exploration [52.03682298194168]
We exploit inductive learning of answer set programs to learn a set of logical rules representing an explainable approximation of the agent policy.
We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch.
Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training.
arXiv Detail & Related papers (2025-01-13T16:13:22Z) - Value Enhancement of Reinforcement Learning via Efficient and Robust
Trust Region Optimization [14.028916306297928]
Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy.
We propose a novel value enhancement method to improve the performance of a given initial policy computed by existing state-of-the-art RL algorithms.
arXiv Detail & Related papers (2023-01-05T18:43:40Z) - Temporal Abstractions-Augmented Temporally Contrastive Learning: An
Alternative to the Laplacian in RL [140.12803111221206]
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting.
We propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation.
We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments.
arXiv Detail & Related papers (2022-03-21T22:07:48Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping [71.214923471669]
Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL)
In this paper, we consider the problem of adaptively utilizing a given shaping reward function.
Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards.
arXiv Detail & Related papers (2020-11-05T05:34:14Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.