Deep Reinforcement Learning with a Stage Incentive Mechanism of Dense
Reward for Robotic Trajectory Planning
- URL: http://arxiv.org/abs/2009.12068v2
- Date: Sun, 23 May 2021 04:55:36 GMT
- Title: Deep Reinforcement Learning with a Stage Incentive Mechanism of Dense
Reward for Robotic Trajectory Planning
- Authors: Gang Peng, Jin Yang, Xinde Lia, Mohammad Omar Khyam
- Abstract summary: We present three dense reward functions to improve the efficiency of DRL-based methods for robot manipulator trajectory planning.
A posture reward function is proposed to speed up the learning process with a more reasonable trajectory.
A stride reward function is proposed to improve the stability of the learning process.
- Score: 3.0242753679068466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: (This work has been submitted to the IEEE for possible publication. Copyright
may be transferred without notice, after which this version may no longer be
accessible.)
To improve the efficiency of deep reinforcement learning (DRL)-based methods
for robot manipulator trajectory planning in random working environments, we
present three dense reward functions. These rewards differ from the traditional
sparse reward. First, a posture reward function is proposed to speed up the
learning process with a more reasonable trajectory by modeling the distance and
direction constraints, which can reduce the blindness of exploration. Second, a
stride reward function is proposed to improve the stability of the learning
process by modeling the distance and movement distance of joint constraints.
Finally, in order to further improve learning efficiency, we are inspired by
the cognitive process of human behavior and propose a stage incentive
mechanism, including a hard stage incentive reward function and a soft stage
incentive reward function. Extensive experiments show that the soft stage
incentive reward function is able to improve the convergence rate by up to
46.9% with the state-of-the-art DRL methods. The percentage increase in the
convergence mean reward was 4.4-15.5% and the percentage decreases with respect
to standard deviation were 21.9-63.2%. In the evaluation experiments, the
success rate of trajectory planning for a robot manipulator reached 99.6%.
Related papers
- Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL.
On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z) - Auxiliary Reward Generation with Transition Distance Representation
Learning [20.150691753213817]
Reinforcement learning (RL) has shown its strength in challenging sequential decision-making problems.
The reward function in RL is crucial to the learning performance, as it serves as a measure of the task completion degree.
We propose a novel representation learning approach that can measure the transition distance'' between states.
arXiv Detail & Related papers (2024-02-12T05:13:44Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks.
We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information.
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z) - Distributional Reward Estimation for Effective Multi-Agent Deep
Reinforcement Learning [19.788336796981685]
We propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL)
Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training.
The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.
arXiv Detail & Related papers (2022-10-14T08:31:45Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - Reinforcement Learning for Robust Missile Autopilot Design [0.0]
This work is pioneer in proposing Reinforcement Learning as a framework for flight control.
Under TRPO's methodology, the collected experience is augmented according to HER, stored in a replay buffer and sampled according to its significance.
Results show that it is possible both to achieve the optimal performance and to improve the agent's robustness to uncertainties.
arXiv Detail & Related papers (2020-11-26T09:30:04Z) - Reward Conditioned Neural Movement Primitives for Population Based
Variational Policy Optimization [4.559353193715442]
This paper studies the reward based policy exploration problem in a supervised learning approach.
We show that our method provides stable learning progress and significant sample efficiency compared to a number of state-of-the-art robotic reinforcement learning methods.
arXiv Detail & Related papers (2020-11-09T09:53:37Z) - Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via
Latent Model Ensembles [73.15950858151594]
This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards.
We combine latent world models with value function estimation to predict infinite-horizon returns and recover associated uncertainty via ensembling.
We apply LOVE to visual robot control tasks in continuous action spaces and demonstrate on average more than 20% improved sample efficiency in comparison to state-of-the-art and other exploration objectives.
arXiv Detail & Related papers (2020-10-27T22:06:57Z) - Balance Between Efficient and Effective Learning: Dense2Sparse Reward
Shaping for Robot Manipulation with Environment Uncertainty [14.178202899299267]
We propose a simple but powerful reward shaping method, namely Dense2Sparse.
It combines the advantage of fast convergence of dense reward and the noise isolation of the sparse reward, to achieve a balance between learning efficiency and effectiveness.
The experiment results show that the Dense2Sparse method obtained higher expected reward compared with the ones using standalone dense reward or sparse reward, and it also has a superior tolerance of system uncertainty.
arXiv Detail & Related papers (2020-03-05T16:10:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.