Handling Sparse Rewards in Reinforcement Learning Using Model Predictive
Control
- URL: http://arxiv.org/abs/2210.01525v1
- Date: Tue, 4 Oct 2022 11:06:38 GMT
- Title: Handling Sparse Rewards in Reinforcement Learning Using Model Predictive
Control
- Authors: Murad Dawood, Nils Dengler, Jorge de Heuvel and Maren Bennewitz
- Abstract summary: Reinforcement learning (RL) has recently proven great success in various domains.
Yet, the design of the reward function requires detailed domain expertise and tedious fine-tuning to ensure that agents are able to learn the desired behaviour.
We propose to use model predictive control(MPC) as an experience source for training RL agents in sparse reward environments.
- Score: 9.118706387430883
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reinforcement learning (RL) has recently proven great success in various
domains. Yet, the design of the reward function requires detailed domain
expertise and tedious fine-tuning to ensure that agents are able to learn the
desired behaviour. Using a sparse reward conveniently mitigates these
challenges. However, the sparse reward represents a challenge on its own, often
resulting in unsuccessful training of the agent. In this paper, we therefore
address the sparse reward problem in RL. Our goal is to find an effective
alternative to reward shaping, without using costly human demonstrations, that
would also be applicable to a wide range of domains. Hence, we propose to use
model predictive control~(MPC) as an experience source for training RL agents
in sparse reward environments. Without the need for reward shaping, we
successfully apply our approach in the field of mobile robot navigation both in
simulation and real-world experiments with a Kuboki Turtlebot 2. We furthermore
demonstrate great improvement over pure RL algorithms in terms of success rate
as well as number of collisions and timeouts. Our experiments show that MPC as
an experience source improves the agent's learning process for a given task in
the case of sparse rewards.
Related papers
- Offline Reinforcement Learning with Imputed Rewards [8.856568375969848]
We propose a Reward Model that can estimate the reward signal from a very limited sample of environment transitions annotated with rewards.
Our results show that, using only 1% of reward-labeled transitions from the original datasets, our learned reward model is able to impute rewards for the remaining 99% of the transitions.
arXiv Detail & Related papers (2024-07-15T15:53:13Z) - Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning [7.07264650720021]
Sub-optimal Data Pre-training, SDP, is an approach that leverages reward-free, sub-optimal data to improve HitL RL algorithms.
We show SDP can significantly improve or achieve competitive performance with state-of-the-art HitL RL algorithms.
arXiv Detail & Related papers (2024-04-30T18:58:33Z) - A State Augmentation based approach to Reinforcement Learning from Human
Preferences [20.13307800821161]
Preference Based Reinforcement Learning attempts to solve the issue by utilizing binary feedbacks on queried trajectory pairs.
We present a state augmentation technique that allows the agent's reward model to be robust.
arXiv Detail & Related papers (2023-02-17T07:10:50Z) - Distributional Reward Estimation for Effective Multi-Agent Deep
Reinforcement Learning [19.788336796981685]
We propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL)
Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training.
The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.
arXiv Detail & Related papers (2022-10-14T08:31:45Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL [91.26538493552817]
We present a formulation of hindsight relabeling for meta-RL, which relabels experience during meta-training to enable learning to learn entirely using sparse reward.
We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments.
arXiv Detail & Related papers (2021-12-02T00:51:17Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.