Stein Variational Goal Generation for adaptive Exploration in Multi-Goal
Reinforcement Learning
- URL: http://arxiv.org/abs/2206.06719v2
- Date: Tue, 2 May 2023 13:44:45 GMT
- Title: Stein Variational Goal Generation for adaptive Exploration in Multi-Goal
Reinforcement Learning
- Authors: Nicolas Castanet, Sylvain Lamprier, Olivier Sigaud
- Abstract summary: In multi-goal Reinforcement Learning, an agent can share experience between related training tasks, resulting in better generalization at test time.
In this work we propose Stein Variational Goal Generation (SVGG), which samples goals of intermediate difficulty for the agent.
The distribution of goals is modeled with particles that are attracted in areas of appropriate difficulty using Stein Variational Gradient Descent.
- Score: 18.62133925594957
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In multi-goal Reinforcement Learning, an agent can share experience between
related training tasks, resulting in better generalization for new tasks at
test time. However, when the goal space has discontinuities and the reward is
sparse, a majority of goals are difficult to reach. In this context, a
curriculum over goals helps agents learn by adapting training tasks to their
current capabilities. In this work we propose Stein Variational Goal Generation
(SVGG), which samples goals of intermediate difficulty for the agent, by
leveraging a learned predictive model of its goal reaching capabilities. The
distribution of goals is modeled with particles that are attracted in areas of
appropriate difficulty using Stein Variational Gradient Descent. We show that
SVGG outperforms state-of-the-art multi-goal Reinforcement Learning methods in
terms of success coverage in hard exploration problems, and demonstrate that it
is endowed with a useful recovery property when the environment changes.
Related papers
- ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Meta-Learning with Self-Improving Momentum Target [72.98879709228981]
We propose Self-improving Momentum Target (SiMT) to improve the performance of a meta-learner.
SiMT generates the target model by adapting from the temporal ensemble of the meta-learner.
We show that SiMT brings a significant performance gain when combined with a wide range of meta-learning methods.
arXiv Detail & Related papers (2022-10-11T06:45:15Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Automatic Goal Generation using Dynamical Distance Learning [5.797847756967884]
Reinforcement Learning (RL) agents can learn to solve complex sequential decision making tasks by interacting with the environment.
In the field of multi-goal RL, where agents are required to reach multiple goals to solve complex tasks, improving sample efficiency can be especially challenging.
We propose a method for automatic goal generation using a dynamical distance function (DDF) in a self-supervised fashion.
arXiv Detail & Related papers (2021-11-07T16:23:56Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - GRIMGEP: Learning Progress for Robust Goal Sampling in Visual Deep
Reinforcement Learning [21.661530291654692]
We propose a framework that allows agents to autonomously identify and ignore noisy distracting regions.
Our framework can be combined with any state-of-the-art novelty seeking goal exploration approaches.
arXiv Detail & Related papers (2020-08-10T19:50:06Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Generating Automatic Curricula via Self-Supervised Active Domain
Randomization [11.389072560141388]
We extend the self-play framework to jointly learn a goal and environment curriculum.
Our method generates a coupled goal-task curriculum, where agents learn through progressively more difficult tasks and environment variations.
Our results show that a curriculum of co-evolving the environment difficulty together with the difficulty of goals set in each environment provides practical benefits in the goal-directed tasks tested.
arXiv Detail & Related papers (2020-02-18T22:45:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.