Deep Reinforcement Learning for Complex Manipulation Tasks with Sparse
Feedback
- URL: http://arxiv.org/abs/2001.03877v1
- Date: Sun, 12 Jan 2020 07:22:15 GMT
- Title: Deep Reinforcement Learning for Complex Manipulation Tasks with Sparse
Feedback
- Authors: Binyamin Manela
- Abstract summary: Hindsight Experience Replay (HER) is a multi-goal reinforcement learning algorithm.
We present three algorithms based on the existing HER algorithm that improves its performances.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning optimal policies from sparse feedback is a known challenge in
reinforcement learning. Hindsight Experience Replay (HER) is a multi-goal
reinforcement learning algorithm that comes to solve such tasks. The algorithm
treats every failure as a success for an alternative (virtual) goal that has
been achieved in the episode and then generalizes from that virtual goal to
real goals. HER has known flaws and is limited to relatively simple tasks. In
this thesis, we present three algorithms based on the existing HER algorithm
that improves its performances. First, we prioritize virtual goals from which
the agent will learn more valuable information. We call this property the
\textit{instructiveness} of the virtual goal and define it by a heuristic
measure, which expresses how well the agent will be able to generalize from
that virtual goal to actual goals. Secondly, we designed a filtering process
that detects and removes misleading samples that may induce bias throughout the
learning process. Lastly, we enable the learning of complex, sequential, tasks
using a form of curriculum learning combined with HER. We call this algorithm
\textit{Curriculum HER}. To test our algorithms, we built three challenging
manipulation environments with sparse reward functions. Each environment has
three levels of complexity. Our empirical results show vast improvement in the
final success rate and sample efficiency when compared to the original HER
algorithm.
Related papers
- Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills.
We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval.
GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z) - Divide & Conquer Imitation Learning [75.31752559017978]
Imitation Learning can be a powerful approach to bootstrap the learning process.
We present a novel algorithm designed to imitate complex robotic tasks from the states of an expert trajectory.
We show that our method imitates a non-holonomic navigation task and scales to a complex simulated robotic manipulation task with very high sample efficiency.
arXiv Detail & Related papers (2022-04-15T09:56:50Z) - The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function.
We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - DERAIL: Diagnostic Environments for Reward And Imitation Learning [9.099589602551573]
We develop a suite of diagnostic tasks that test individual facets of algorithm performance in isolation.
Results confirm that algorithm performance is highly sensitive to implementation details.
Case-study shows how the suite can pinpoint design flaws and rapidly evaluate candidate solutions.
arXiv Detail & Related papers (2020-12-02T18:07:09Z) - C-Learning: Horizon-Aware Cumulative Accessibility Estimation [29.588146016880284]
We introduce the concept of cumulative accessibility functions, which measure the reachability of a goal from a given state within a specified horizon.
We show that these functions obey a recurrence relation, which enables learning from offline interactions.
We evaluate our approach on a set of multi-goal discrete and continuous control tasks.
arXiv Detail & Related papers (2020-11-24T20:34:31Z) - Curriculum Learning with Hindsight Experience Replay for Sequential
Object Manipulation Tasks [1.370633147306388]
We present an algorithm that combines curriculum learning with Hindsight Experience Replay (HER) to learn sequential object manipulation tasks.
The algorithm exploits the recurrent structure inherent in many object manipulation tasks and implements the entire learning process in the original simulation without adjusting it to each source task.
arXiv Detail & Related papers (2020-08-21T08:59:28Z) - Meta-Gradient Reinforcement Learning with an Objective Discovered Online [54.15180335046361]
We propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network.
Because the objective is discovered online, it can adapt to changes over time.
On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency.
arXiv Detail & Related papers (2020-07-16T16:17:09Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.