Wasserstein Distance Maximizing Intrinsic Control
- URL: http://arxiv.org/abs/2110.15331v1
- Date: Thu, 28 Oct 2021 17:46:07 GMT
- Title: Wasserstein Distance Maximizing Intrinsic Control
- Authors: Ishan Durugkar, Steven Hansen, Stephen Spencer, Volodymyr Mnih
- Abstract summary: This paper deals with the problem of learning a skill-conditioned policy that acts meaningfully in the absence of a reward signal.
It shows that such an objective leads to a policy that covers more distance in the MDP than diversity based objectives.
- Score: 14.963071654271756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper deals with the problem of learning a skill-conditioned policy that
acts meaningfully in the absence of a reward signal. Mutual information based
objectives have shown some success in learning skills that reach a diverse set
of states in this setting. These objectives include a KL-divergence term, which
is maximized by visiting distinct states even if those states are not far apart
in the MDP. This paper presents an approach that rewards the agent for learning
skills that maximize the Wasserstein distance of their state visitation from
the start state of the skill. It shows that such an objective leads to a policy
that covers more distance in the MDP than diversity based objectives, and
validates the results on a variety of Atari environments.
Related papers
- Conformal Off-Policy Evaluation in Markov Decision Processes [53.786439742572995]
Reinforcement Learning aims at identifying and evaluating efficient control policies from data.
Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees.
We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty.
arXiv Detail & Related papers (2023-04-05T16:45:11Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Local Explanations for Reinforcement Learning [14.87922813917482]
We propose a novel perspective to understanding RL policies based on identifying important states from automatically learned meta-states.
We show that our algorithm to find meta-states converges and the objective that selects important states from each meta-state is submodular leading to efficient high quality greedy selection.
arXiv Detail & Related papers (2022-02-08T02:02:09Z) - Direct then Diffuse: Incremental Unsupervised Skill Discovery for State
Covering and Goal Reaching [98.25207998996066]
We build on the mutual information framework for skill discovery and introduce UPSIDE to address the coverage-directedness trade-off.
We illustrate in several navigation and control environments how the skills learned by UPSIDE solve sparse-reward downstream tasks better than existing baselines.
arXiv Detail & Related papers (2021-10-27T14:22:19Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep
Reinforcement Learning [9.014110264448371]
We propose a novel unsupervised learning approach named goal-conditioned policy with intrinsic motivation (GPIM)
GPIM jointly learns both an abstract-level policy and a goal-conditioned policy.
Experiments on various robotic tasks demonstrate the effectiveness and efficiency of our proposed GPIM method.
arXiv Detail & Related papers (2021-04-11T16:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.