Subgoal-based Reward Shaping to Improve Efficiency in Reinforcement
Learning
- URL: http://arxiv.org/abs/2104.06411v1
- Date: Tue, 13 Apr 2021 14:28:48 GMT
- Title: Subgoal-based Reward Shaping to Improve Efficiency in Reinforcement
Learning
- Authors: Takato Okudo and Seiji Yamada
- Abstract summary: We extend potential-based reward shaping and propose a subgoal-based reward shaping.
Our method makes it easier for human trainers to share their knowledge of subgoals.
- Score: 7.6146285961466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning, which acquires a policy maximizing long-term rewards,
has been actively studied. Unfortunately, this learning type is too slow and
difficult to use in practical situations because the state-action space becomes
huge in real environments. Many studies have incorporated human knowledge into
reinforcement Learning. Though human knowledge on trajectories is often used, a
human could be asked to control an AI agent, which can be difficult. Knowledge
on subgoals may lessen this requirement because humans need only to consider a
few representative states on an optimal trajectory in their minds. The
essential factor for learning efficiency is rewards. Potential-based reward
shaping is a basic method for enriching rewards. However, it is often difficult
to incorporate subgoals for accelerating learning over potential-based reward
shaping. This is because the appropriate potentials are not intuitive for
humans. We extend potential-based reward shaping and propose a subgoal-based
reward shaping. The method makes it easier for human trainers to share their
knowledge of subgoals. To evaluate our method, we obtained a subgoal series
from participants and conducted experiments in three domains,
four-rooms(discrete states and discrete actions), pinball(continuous and
discrete), and picking(both continuous). We compared our method with a baseline
reinforcement learning algorithm and other subgoal-based methods, including
random subgoal and naive subgoal-based reward shaping. As a result, we found
out that our reward shaping outperformed all other methods in learning
efficiency.
Related papers
- Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Reward Shaping with Subgoals for Social Navigation [7.6146285961466]
Social navigation has been gaining attentions with the growth in machine intelligence.
reinforcement learning can select an action in the prediction phase at a low computational cost.
We propose a reward shaping method with subgoals to accelerate learning.
arXiv Detail & Related papers (2021-04-13T13:52:58Z) - Reward Shaping with Dynamic Trajectory Aggregation [7.6146285961466]
Potential-based reward shaping is a basic method for enriching rewards.
SARSA-RS learns the potential function and acquires it.
We propose a trajectory aggregation that uses subgoal series.
arXiv Detail & Related papers (2021-04-13T13:07:48Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Emergent Real-World Robotic Skills via Unsupervised Off-Policy
Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks.
We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.