Deep Reinforcement Learning with Adaptive Hierarchical Reward for
MultiMulti-Phase Multi Multi-Objective Dexterous Manipulation
- URL: http://arxiv.org/abs/2205.13441v1
- Date: Thu, 26 May 2022 15:44:31 GMT
- Title: Deep Reinforcement Learning with Adaptive Hierarchical Reward for
MultiMulti-Phase Multi Multi-Objective Dexterous Manipulation
- Authors: Lingfeng Tao, Jiucai Zhang, Xiaoli Zhang
- Abstract summary: Varying priority makes a robot hardly or even failed to learn an optimal policy with a deep reinforcement learning (DRL) method.
We develop a novel Adaptive Hierarchical Reward Mechanism (AHRM) to guide the DRL agent to learn manipulation tasks with multiple prioritized objectives.
The proposed method is validated in a multi-objective manipulation task with a JACO robot arm.
- Score: 11.638614321552616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dexterous manipulation tasks usually have multiple objectives, and the
priorities of these objectives may vary at different phases of a manipulation
task. Varying priority makes a robot hardly or even failed to learn an optimal
policy with a deep reinforcement learning (DRL) method. To solve this problem,
we develop a novel Adaptive Hierarchical Reward Mechanism (AHRM) to guide the
DRL agent to learn manipulation tasks with multiple prioritized objectives. The
AHRM can determine the objective priorities during the learning process and
update the reward hierarchy to adapt to the changing objective priorities at
different phases. The proposed method is validated in a multi-objective
manipulation task with a JACO robot arm in which the robot needs to manipulate
a target with obstacles surrounded. The simulation and physical experiment
results show that the proposed method improved robot learning in task
performance and learning efficiency.
Related papers
- Enhancing Robotic Navigation: An Evaluation of Single and
Multi-Objective Reinforcement Learning Strategies [0.9208007322096532]
This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal.
By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals.
arXiv Detail & Related papers (2023-12-13T08:00:26Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Enhancing Robotic Manipulation: Harnessing the Power of Multi-Task
Reinforcement Learning and Single Life Reinforcement Learning in Meta-World [0.0]
This research project is to enable a robotic arm to execute seven distinct tasks within the Meta World environment.
A trained model will serve as a source of prior data for the single-life RL algorithm.
An ablation study demonstrates that MT-QWALE successfully completes tasks with a slightly larger number of steps even after hiding the final goal position.
arXiv Detail & Related papers (2023-10-23T06:35:44Z) - Leveraging Sequentiality in Reinforcement Learning from a Single
Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration.
We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z) - Autonomous Open-Ended Learning of Tasks with Non-Stationary
Interdependencies [64.0476282000118]
Intrinsic motivations have proven to generate a task-agnostic signal to properly allocate the training time amongst goals.
While the majority of works in the field of intrinsically motivated open-ended learning focus on scenarios where goals are independent from each other, only few of them studied the autonomous acquisition of interdependent tasks.
In particular, we first deepen the analysis of a previous system, showing the importance of incorporating information about the relationships between tasks at a higher level of the architecture.
Then we introduce H-GRAIL, a new system that extends the previous one by adding a new learning layer to store the autonomously acquired sequences
arXiv Detail & Related papers (2022-05-16T10:43:01Z) - Automatic Goal Generation using Dynamical Distance Learning [5.797847756967884]
Reinforcement Learning (RL) agents can learn to solve complex sequential decision making tasks by interacting with the environment.
In the field of multi-goal RL, where agents are required to reach multiple goals to solve complex tasks, improving sample efficiency can be especially challenging.
We propose a method for automatic goal generation using a dynamical distance function (DDF) in a self-supervised fashion.
arXiv Detail & Related papers (2021-11-07T16:23:56Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - Diversity-based Trajectory and Goal Selection with Hindsight Experience
Replay [8.259694128526112]
We propose diversity-based trajectory and goal selection with HER (DTGSH)
We show that our method can learn more quickly and reach higher performance than other state-of-the-art approaches on all tasks.
arXiv Detail & Related papers (2021-08-17T21:34:24Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks.
The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood.
We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.