Related papers: Deep Reinforcement Learning for Complex Manipulation Tasks with Sparse Feedback

Deep Reinforcement Learning for Complex Manipulation Tasks with Sparse Feedback

URL: http://arxiv.org/abs/2001.03877v1
Date: Sun, 12 Jan 2020 07:22:15 GMT
Title: Deep Reinforcement Learning for Complex Manipulation Tasks with Sparse Feedback
Authors: Binyamin Manela
Abstract summary: Hindsight Experience Replay (HER) is a multi-goal reinforcement learning algorithm. We present three algorithms based on the existing HER algorithm that improves its performances.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning optimal policies from sparse feedback is a known challenge in reinforcement learning. Hindsight Experience Replay (HER) is a multi-goal reinforcement learning algorithm that comes to solve such tasks. The algorithm treats every failure as a success for an alternative (virtual) goal that has been achieved in the episode and then generalizes from that virtual goal to real goals. HER has known flaws and is limited to relatively simple tasks. In this thesis, we present three algorithms based on the existing HER algorithm that improves its performances. First, we prioritize virtual goals from which the agent will learn more valuable information. We call this property the \textit{instructiveness} of the virtual goal and define it by a heuristic measure, which expresses how well the agent will be able to generalize from that virtual goal to actual goals. Secondly, we designed a filtering process that detects and removes misleading samples that may induce bias throughout the learning process. Lastly, we enable the learning of complex, sequential, tasks using a form of curriculum learning combined with HER. We call this algorithm \textit{Curriculum HER}. To test our algorithms, we built three challenging manipulation environments with sparse reward functions. Each environment has three levels of complexity. Our empirical results show vast improvement in the final success rate and sample efficiency when compared to the original HER algorithm.

Related papers

Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning [2.5352713493505785]
Reinforcement learning -- algorithms that teach artificial agents to interact with environments by maximising reward signals -- has achieved significant success in recent years. One promising research direction involves introducing goals to allow multimodal policies, commonly through hierarchical or curriculum reinforcement learning. We present a novel probabilistic curriculum learning algorithm to suggest goals for reinforcement learning agents in continuous control and navigation tasks.
arXiv Detail & Related papers (2025-04-02T08:15:16Z)
Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills. We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval. GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z)
Divide & Conquer Imitation Learning [75.31752559017978]
Imitation Learning can be a powerful approach to bootstrap the learning process. We present a novel algorithm designed to imitate complex robotic tasks from the states of an expert trajectory. We show that our method imitates a non-holonomic navigation task and scales to a complex simulated robotic manipulation task with very high sample efficiency.
arXiv Detail & Related papers (2022-04-15T09:56:50Z)
The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function. We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
DERAIL: Diagnostic Environments for Reward And Imitation Learning [9.099589602551573]
We develop a suite of diagnostic tasks that test individual facets of algorithm performance in isolation. Results confirm that algorithm performance is highly sensitive to implementation details. Case-study shows how the suite can pinpoint design flaws and rapidly evaluate candidate solutions.
arXiv Detail & Related papers (2020-12-02T18:07:09Z)
C-Learning: Horizon-Aware Cumulative Accessibility Estimation [29.588146016880284]
We introduce the concept of cumulative accessibility functions, which measure the reachability of a goal from a given state within a specified horizon. We show that these functions obey a recurrence relation, which enables learning from offline interactions. We evaluate our approach on a set of multi-goal discrete and continuous control tasks.
arXiv Detail & Related papers (2020-11-24T20:34:31Z)
Curriculum Learning with Hindsight Experience Replay for Sequential Object Manipulation Tasks [1.370633147306388]
We present an algorithm that combines curriculum learning with Hindsight Experience Replay (HER) to learn sequential object manipulation tasks. The algorithm exploits the recurrent structure inherent in many object manipulation tasks and implements the entire learning process in the original simulation without adjusting it to each source task.
arXiv Detail & Related papers (2020-08-21T08:59:28Z)
Meta-Gradient Reinforcement Learning with an Objective Discovered Online [54.15180335046361]
We propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network. Because the objective is discovered online, it can adapt to changes over time. On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency.
arXiv Detail & Related papers (2020-07-16T16:17:09Z)
Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors. In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency. We propose setting up an automatic curriculum for goals that the agent needs to solve. We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.