Related papers: Reward Engineering for Object Pick and Place Training

Reward Engineering for Object Pick and Place Training

URL: http://arxiv.org/abs/2001.03792v1
Date: Sat, 11 Jan 2020 20:13:28 GMT
Title: Reward Engineering for Object Pick and Place Training
Authors: Raghav Nagpal, Achyuthan Unni Krishnan and Hanshen Yu
Abstract summary: We have used the Pick and Place environment provided by OpenAI's Gym to engineer rewards. In the default configuration of the OpenAI baseline and environment the reward function is calculated using the distance between the target location and the robot end-effector. We were also able to introduce certain user desired trajectories in the learnt policies.
Score: 3.4806267677524896
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robotic grasping is a crucial area of research as it can result in the acceleration of the automation of several Industries utilizing robots ranging from manufacturing to healthcare. Reinforcement learning is the field of study where an agent learns a policy to execute an action by exploring and exploiting rewards from an environment. Reinforcement learning can thus be used by the agent to learn how to execute a certain task, in our case grasping an object. We have used the Pick and Place environment provided by OpenAI's Gym to engineer rewards. Hindsight Experience Replay (HER) has shown promising results with problems having a sparse reward. In the default configuration of the OpenAI baseline and environment the reward function is calculated using the distance between the target location and the robot end-effector. By weighting the cost based on the distance of the end-effector from the goal in the x,y and z-axes we were able to almost halve the learning time compared to the baselines provided by OpenAI, an intuitive strategy that further reduced learning time. In this project, we were also able to introduce certain user desired trajectories in the learnt policies (city-block / Manhattan trajectories). This helps us understand that by engineering the rewards we can tune the agent to learn policies in a certain way even if it might not be the most optimal but is the desired manner.

Related papers

Autonomous state-space segmentation for Deep-RL sparse reward scenarios [0.30693357740321775]
Intrinsic Motivations could be an effective way to help Deep Reinforcement Learning algorithms learn. We propose a two-level architecture that alternates an ''intrinsically driven'' phase of exploration and autonomous sub-goal generation.
arXiv Detail & Related papers (2025-04-04T13:06:23Z)
SuPLE: Robot Learning with Lyapunov Rewards [4.424170214926035]
We use the properties of the dynamics to produce system-appropriate reward without adding external assumptions. We demonstrate that the Sum of the Positive Lyapunov Exponents' (SuPLE) is a strong candidate for the design of such a reward. It eliminates the need to start the training trajectories at arbitrary states, also known as auxiliary exploration.
arXiv Detail & Related papers (2024-11-20T03:20:50Z)
Deep Reinforcement Learning with Dynamic Graphs for Adaptive Informative Path Planning [22.48658555542736]
Key task in robotic data acquisition is planning paths through an initially unknown environment to collect observations. We propose a novel deep reinforcement learning approach for adaptively replanning robot paths to map targets of interest in unknown 3D environments.
arXiv Detail & Related papers (2024-02-07T14:24:41Z)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world. Recent methods aim to mitigate misalignment by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Contact Energy Based Hindsight Experience Prioritization [19.42106651692228]
Multi-goal robot manipulation tasks with sparse rewards are difficult for reinforcement learning (RL) algorithms. Recent algorithms such as Hindsight Experience Replay (HER) expedite learning by taking advantage of failed trajectories. We propose a novel approach Contact Energy Based Prioritization(CEBP) to select the samples from the replay buffer based on rich information due to contact.
arXiv Detail & Related papers (2023-12-05T11:32:25Z)
Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own [59.11934130045106]
We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation.
arXiv Detail & Related papers (2023-10-04T07:56:42Z)
Planning Goals for Exploration [22.047797646698527]
"Planning Exploratory Goals" (PEG) is a method that sets goals for each training episode to directly optimize an intrinsic exploration reward. PEG learns world models and adapts sampling-based planning algorithms to "plan goal commands"
arXiv Detail & Related papers (2023-03-23T02:51:50Z)
TransPath: Learning Heuristics For Grid-Based Pathfinding via Transformers [64.88759709443819]
We suggest learning the instance-dependent proxies that are supposed to notably increase the efficiency of the search. The first proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one. The second proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path.
arXiv Detail & Related papers (2022-12-22T14:26:11Z)
Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z)
XAI-N: Sensor-based Robot Navigation using Expert Policies and Decision Trees [55.9643422180256]
We present a novel sensor-based learning navigation algorithm to compute a collision-free trajectory for a robot in dense and dynamic environments. Our approach uses deep reinforcement learning-based expert policy that is trained using a sim2real paradigm. We highlight the benefits of our algorithm in simulated environments and navigating a Clearpath Jackal robot among moving pedestrians.
arXiv Detail & Related papers (2021-04-22T01:33:10Z)
Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills [93.12417203541948]
We propose the objective of learning a functional understanding of the environment by learning to reach any goal state in a given dataset. We find that our method can operate on high-dimensional camera images and learn a variety of skills on real robots that generalize to previously unseen scenes and objects.
arXiv Detail & Related papers (2021-04-15T20:10:11Z)
Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious. We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data. In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z)
On Reward Shaping for Mobile Robot Navigation: A Reinforcement Learning and SLAM Based Approach [7.488722678999039]
We present a map-less path planning algorithm based on Deep Reinforcement Learning (DRL) for mobile robots navigating in unknown environment. The planner is trained using a reward function shaped based on the online knowledge of the map of the training environment. The policy trained in the simulation environment can be directly and successfully transferred to the real robot.
arXiv Detail & Related papers (2020-02-10T22:00:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.