Automatic Curriculum Learning through Value Disagreement
- URL: http://arxiv.org/abs/2006.09641v1
- Date: Wed, 17 Jun 2020 03:58:25 GMT
- Title: Automatic Curriculum Learning through Value Disagreement
- Authors: Yunzhi Zhang, Pieter Abbeel, Lerrel Pinto
- Abstract summary: Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
- Score: 95.19299356298876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continually solving new, unsolved tasks is the key to learning diverse
behaviors. Through reinforcement learning (RL), we have made massive strides
towards solving tasks that have a single goal. However, in the multi-task
domain, where an agent needs to reach multiple goals, the choice of training
goals can largely affect sample efficiency. When biological agents learn, there
is often an organized and meaningful order to which learning happens. Inspired
by this, we propose setting up an automatic curriculum for goals that the agent
needs to solve. Our key insight is that if we can sample goals at the frontier
of the set of goals that an agent is able to reach, it will provide a
significantly stronger learning signal compared to randomly sampled goals. To
operationalize this idea, we introduce a goal proposal module that prioritizes
goals that maximize the epistemic uncertainty of the Q-function of the policy.
This simple technique samples goals that are neither too hard nor too easy for
the agent to solve, hence enabling continual improvement. We evaluate our
method across 13 multi-goal robotic tasks and 5 navigation tasks, and
demonstrate performance gains over current state-of-the-art methods.
Related papers
- Generating Adversarial Examples with Task Oriented Multi-Objective
Optimization [21.220906842166425]
Adversarial training is one of the most efficient methods to improve the model's robustness.
We propose emphTask Oriented MOO to address this issue.
Our principle is to only maintain the goal-achieved tasks, while letting the spend more effort on improving the goal-unachieved tasks.
arXiv Detail & Related papers (2023-04-26T01:30:02Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Deep Reinforcement Learning with Adaptive Hierarchical Reward for
MultiMulti-Phase Multi Multi-Objective Dexterous Manipulation [11.638614321552616]
Varying priority makes a robot hardly or even failed to learn an optimal policy with a deep reinforcement learning (DRL) method.
We develop a novel Adaptive Hierarchical Reward Mechanism (AHRM) to guide the DRL agent to learn manipulation tasks with multiple prioritized objectives.
The proposed method is validated in a multi-objective manipulation task with a JACO robot arm.
arXiv Detail & Related papers (2022-05-26T15:44:31Z) - Autonomous Open-Ended Learning of Tasks with Non-Stationary
Interdependencies [64.0476282000118]
Intrinsic motivations have proven to generate a task-agnostic signal to properly allocate the training time amongst goals.
While the majority of works in the field of intrinsically motivated open-ended learning focus on scenarios where goals are independent from each other, only few of them studied the autonomous acquisition of interdependent tasks.
In particular, we first deepen the analysis of a previous system, showing the importance of incorporating information about the relationships between tasks at a higher level of the architecture.
Then we introduce H-GRAIL, a new system that extends the previous one by adding a new learning layer to store the autonomously acquired sequences
arXiv Detail & Related papers (2022-05-16T10:43:01Z) - Automatic Goal Generation using Dynamical Distance Learning [5.797847756967884]
Reinforcement Learning (RL) agents can learn to solve complex sequential decision making tasks by interacting with the environment.
In the field of multi-goal RL, where agents are required to reach multiple goals to solve complex tasks, improving sample efficiency can be especially challenging.
We propose a method for automatic goal generation using a dynamical distance function (DDF) in a self-supervised fashion.
arXiv Detail & Related papers (2021-11-07T16:23:56Z) - Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning [15.33496710690063]
We propose goal-aware cross-entropy (GACE) loss, that can be utilized in a self-supervised way.
We then devise goal-discriminative attention networks (GDAN) which utilize the goal-relevant information to focus on the given instruction.
arXiv Detail & Related papers (2021-10-25T14:24:39Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - Learning with AMIGo: Adversarially Motivated Intrinsic Goals [63.680207855344875]
AMIGo is a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals.
We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks.
arXiv Detail & Related papers (2020-06-22T10:22:08Z) - Generating Automatic Curricula via Self-Supervised Active Domain
Randomization [11.389072560141388]
We extend the self-play framework to jointly learn a goal and environment curriculum.
Our method generates a coupled goal-task curriculum, where agents learn through progressively more difficult tasks and environment variations.
Our results show that a curriculum of co-evolving the environment difficulty together with the difficulty of goals set in each environment provides practical benefits in the goal-directed tasks tested.
arXiv Detail & Related papers (2020-02-18T22:45:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.