Reward Engineering for Object Pick and Place Training
- URL: http://arxiv.org/abs/2001.03792v1
- Date: Sat, 11 Jan 2020 20:13:28 GMT
- Title: Reward Engineering for Object Pick and Place Training
- Authors: Raghav Nagpal, Achyuthan Unni Krishnan and Hanshen Yu
- Abstract summary: We have used the Pick and Place environment provided by OpenAI's Gym to engineer rewards.
In the default configuration of the OpenAI baseline and environment the reward function is calculated using the distance between the target location and the robot end-effector.
We were also able to introduce certain user desired trajectories in the learnt policies.
- Score: 3.4806267677524896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robotic grasping is a crucial area of research as it can result in the
acceleration of the automation of several Industries utilizing robots ranging
from manufacturing to healthcare. Reinforcement learning is the field of study
where an agent learns a policy to execute an action by exploring and exploiting
rewards from an environment. Reinforcement learning can thus be used by the
agent to learn how to execute a certain task, in our case grasping an object.
We have used the Pick and Place environment provided by OpenAI's Gym to
engineer rewards. Hindsight Experience Replay (HER) has shown promising results
with problems having a sparse reward. In the default configuration of the
OpenAI baseline and environment the reward function is calculated using the
distance between the target location and the robot end-effector. By weighting
the cost based on the distance of the end-effector from the goal in the x,y and
z-axes we were able to almost halve the learning time compared to the baselines
provided by OpenAI, an intuitive strategy that further reduced learning time.
In this project, we were also able to introduce certain user desired
trajectories in the learnt policies (city-block / Manhattan trajectories). This
helps us understand that by engineering the rewards we can tune the agent to
learn policies in a certain way even if it might not be the most optimal but is
the desired manner.
Related papers
- Deep Reinforcement Learning with Dynamic Graphs for Adaptive Informative Path Planning [22.48658555542736]
Key task in robotic data acquisition is planning paths through an initially unknown environment to collect observations.
We propose a novel deep reinforcement learning approach for adaptively replanning robot paths to map targets of interest in unknown 3D environments.
arXiv Detail & Related papers (2024-02-07T14:24:41Z) - Contact Energy Based Hindsight Experience Prioritization [19.42106651692228]
Multi-goal robot manipulation tasks with sparse rewards are difficult for reinforcement learning (RL) algorithms.
Recent algorithms such as Hindsight Experience Replay (HER) expedite learning by taking advantage of failed trajectories.
We propose a novel approach Contact Energy Based Prioritization(CEBP) to select the samples from the replay buffer based on rich information due to contact.
arXiv Detail & Related papers (2023-12-05T11:32:25Z) - Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own [59.11934130045106]
We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models.
Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions.
Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation.
arXiv Detail & Related papers (2023-10-04T07:56:42Z) - Planning Goals for Exploration [22.047797646698527]
"Planning Exploratory Goals" (PEG) is a method that sets goals for each training episode to directly optimize an intrinsic exploration reward.
PEG learns world models and adapts sampling-based planning algorithms to "plan goal commands"
arXiv Detail & Related papers (2023-03-23T02:51:50Z) - TransPath: Learning Heuristics For Grid-Based Pathfinding via
Transformers [64.88759709443819]
We suggest learning the instance-dependent proxies that are supposed to notably increase the efficiency of the search.
The first proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one.
The second proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path.
arXiv Detail & Related papers (2022-12-22T14:26:11Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - XAI-N: Sensor-based Robot Navigation using Expert Policies and Decision
Trees [55.9643422180256]
We present a novel sensor-based learning navigation algorithm to compute a collision-free trajectory for a robot in dense and dynamic environments.
Our approach uses deep reinforcement learning-based expert policy that is trained using a sim2real paradigm.
We highlight the benefits of our algorithm in simulated environments and navigating a Clearpath Jackal robot among moving pedestrians.
arXiv Detail & Related papers (2021-04-22T01:33:10Z) - Actionable Models: Unsupervised Offline Reinforcement Learning of
Robotic Skills [93.12417203541948]
We propose the objective of learning a functional understanding of the environment by learning to reach any goal state in a given dataset.
We find that our method can operate on high-dimensional camera images and learn a variety of skills on real robots that generalize to previously unseen scenes and objects.
arXiv Detail & Related papers (2021-04-15T20:10:11Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Low Dimensional State Representation Learning with Reward-shaped Priors [7.211095654886105]
We propose a method that aims at learning a mapping from the observations into a lower-dimensional state space.
This mapping is learned with unsupervised learning using loss functions shaped to incorporate prior knowledge of the environment and the task.
We test the method on several mobile robot navigation tasks in a simulation environment and also on a real robot.
arXiv Detail & Related papers (2020-07-29T13:00:39Z) - On Reward Shaping for Mobile Robot Navigation: A Reinforcement Learning
and SLAM Based Approach [7.488722678999039]
We present a map-less path planning algorithm based on Deep Reinforcement Learning (DRL) for mobile robots navigating in unknown environment.
The planner is trained using a reward function shaped based on the online knowledge of the map of the training environment.
The policy trained in the simulation environment can be directly and successfully transferred to the real robot.
arXiv Detail & Related papers (2020-02-10T22:00:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.