Diversity-based Trajectory and Goal Selection with Hindsight Experience
Replay
- URL: http://arxiv.org/abs/2108.07887v1
- Date: Tue, 17 Aug 2021 21:34:24 GMT
- Title: Diversity-based Trajectory and Goal Selection with Hindsight Experience
Replay
- Authors: Tianhong Dai, Hengyan Liu, Kai Arulkumaran, Guangyu Ren, Anil Anthony
Bharath
- Abstract summary: We propose diversity-based trajectory and goal selection with HER (DTGSH)
We show that our method can learn more quickly and reach higher performance than other state-of-the-art approaches on all tasks.
- Score: 8.259694128526112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hindsight experience replay (HER) is a goal relabelling technique typically
used with off-policy deep reinforcement learning algorithms to solve
goal-oriented tasks; it is well suited to robotic manipulation tasks that
deliver only sparse rewards. In HER, both trajectories and transitions are
sampled uniformly for training. However, not all of the agent's experiences
contribute equally to training, and so naive uniform sampling may lead to
inefficient learning. In this paper, we propose diversity-based trajectory and
goal selection with HER (DTGSH). Firstly, trajectories are sampled according to
the diversity of the goal states as modelled by determinantal point processes
(DPPs). Secondly, transitions with diverse goal states are selected from the
trajectories by using k-DPPs. We evaluate DTGSH on five challenging robotic
manipulation tasks in simulated robot environments, where we show that our
method can learn more quickly and reach higher performance than other
state-of-the-art approaches on all tasks.
Related papers
- Enhancing Robotic Navigation: An Evaluation of Single and
Multi-Objective Reinforcement Learning Strategies [0.9208007322096532]
This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal.
By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals.
arXiv Detail & Related papers (2023-12-13T08:00:26Z) - NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration.
We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments.
Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z) - Leveraging Sequentiality in Reinforcement Learning from a Single
Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration.
We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z) - Deep Reinforcement Learning with Adaptive Hierarchical Reward for
MultiMulti-Phase Multi Multi-Objective Dexterous Manipulation [11.638614321552616]
Varying priority makes a robot hardly or even failed to learn an optimal policy with a deep reinforcement learning (DRL) method.
We develop a novel Adaptive Hierarchical Reward Mechanism (AHRM) to guide the DRL agent to learn manipulation tasks with multiple prioritized objectives.
The proposed method is validated in a multi-objective manipulation task with a JACO robot arm.
arXiv Detail & Related papers (2022-05-26T15:44:31Z) - Automatic Goal Generation using Dynamical Distance Learning [5.797847756967884]
Reinforcement Learning (RL) agents can learn to solve complex sequential decision making tasks by interacting with the environment.
In the field of multi-goal RL, where agents are required to reach multiple goals to solve complex tasks, improving sample efficiency can be especially challenging.
We propose a method for automatic goal generation using a dynamical distance function (DDF) in a self-supervised fashion.
arXiv Detail & Related papers (2021-11-07T16:23:56Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - Reward Conditioned Neural Movement Primitives for Population Based
Variational Policy Optimization [4.559353193715442]
This paper studies the reward based policy exploration problem in a supervised learning approach.
We show that our method provides stable learning progress and significant sample efficiency compared to a number of state-of-the-art robotic reinforcement learning methods.
arXiv Detail & Related papers (2020-11-09T09:53:37Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Scalable Multi-Task Imitation Learning with Autonomous Improvement [159.9406205002599]
We build an imitation learning system that can continuously improve through autonomous data collection.
We leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted.
In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement.
arXiv Detail & Related papers (2020-02-25T18:56:42Z) - Meta Adaptation using Importance Weighted Demonstrations [19.37671674146514]
In some cases, the distribution shifts, so much, that it is difficult for an agent to infer the new task.
We propose a novel algorithm to generalize on any related task by leveraging prior knowledge on a set of specific tasks.
We show experiments where the robot is trained from a diversity of environmental tasks and is also able to adapt to an unseen environment.
arXiv Detail & Related papers (2019-11-23T07:22:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.