A Study on Dense and Sparse (Visual) Rewards in Robot Policy Learning
- URL: http://arxiv.org/abs/2108.03222v1
- Date: Fri, 6 Aug 2021 17:47:48 GMT
- Title: A Study on Dense and Sparse (Visual) Rewards in Robot Policy Learning
- Authors: Abdalkarim Mohtasib, Gerhard Neumann and Heriberto Cuayahuitl
- Abstract summary: We study the performance of multiple state-of-the-art deep reinforcement learning algorithms under different types of reward.
Our results show that visual dense rewards are more successful than visual sparse rewards and that there is no single best algorithm for all tasks.
- Score: 19.67628391301068
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep Reinforcement Learning (DRL) is a promising approach for teaching robots
new behaviour. However, one of its main limitations is the need for carefully
hand-coded reward signals by an expert. We argue that it is crucial to automate
the reward learning process so that new skills can be taught to robots by their
users. To address such automation, we consider task success classifiers using
visual observations to estimate the rewards in terms of task success. In this
work, we study the performance of multiple state-of-the-art deep reinforcement
learning algorithms under different types of reward: Dense, Sparse, Visual
Dense, and Visual Sparse rewards. Our experiments in various simulation tasks
(Pendulum, Reacher, Pusher, and Fetch Reach) show that while DRL agents can
learn successful behaviours using visual rewards when the goal targets are
distinguishable, their performance may decrease if the task goal is not clearly
visible. Our results also show that visual dense rewards are more successful
than visual sparse rewards and that there is no single best algorithm for all
tasks.
Related papers
- On-Robot Reinforcement Learning with Goal-Contrastive Rewards [24.415607337006968]
Reinforcement Learning (RL) has the potential to enable robots to learn from their own actions in the real world.
We propose GCR (Goal-intensiveive Rewards), a dense reward function learning method that can be trained on passive video demonstrations.
GCR combines two loss functions, an implicit value loss function that models how the reward increases when traversing a successful trajectory, and a goal-contrastive loss that discriminates between successful and failed trajectories.
arXiv Detail & Related papers (2024-10-25T22:11:54Z) - Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL.
On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z) - DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks [26.730889757506915]
We propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage tasks.
By leveraging the stage structures of the task, DrS learns a high-quality dense reward from sparse rewards and demonstrations if given.
Experiments on three physical robot manipulation task families with 1000+ task variants demonstrate that our learned rewards can be reused in unseen tasks.
arXiv Detail & Related papers (2024-04-25T17:28:33Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Reinforcement learning with Demonstrations from Mismatched Task under
Sparse Reward [7.51772160511614]
Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems.
Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task.
In this paper, we consider the case where the target task is mismatched from but similar with that of the expert.
Existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards.
arXiv Detail & Related papers (2022-12-03T02:24:59Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Automatic Reward Design via Learning Motivation-Consistent Intrinsic
Rewards [46.068337522093096]
We introduce the concept of motivation which captures the underlying goal of maximizing certain rewards.
Our method performs better than the state-of-the-art methods in handling problems of delayed reward, exploration, and credit assignment.
arXiv Detail & Related papers (2022-07-29T14:52:02Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Emergent Real-World Robotic Skills via Unsupervised Off-Policy
Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks.
We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.