CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery
- URL: http://arxiv.org/abs/2202.00161v1
- Date: Tue, 1 Feb 2022 00:36:29 GMT
- Title: CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery
- Authors: Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind
Rajeswaran, Pieter Abbeel
- Abstract summary: We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery.
CIC explicitly incentivizes diverse behaviors by maximizing state entropy.
We find that CIC substantially improves over prior unsupervised skill discovery methods.
- Score: 88.97076030698433
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Contrastive Intrinsic Control (CIC), an algorithm for
unsupervised skill discovery that maximizes the mutual information between
skills and state transitions. In contrast to most prior approaches, CIC uses a
decomposition of the mutual information that explicitly incentivizes diverse
behaviors by maximizing state entropy. We derive a novel lower bound estimate
for the mutual information which combines a particle estimator for state
entropy to generate diverse behaviors and contrastive learning to distill these
behaviors into distinct skills. We evaluate our algorithm on the Unsupervised
Reinforcement Learning Benchmark, which consists of a long reward-free
pre-training phase followed by a short adaptation phase to downstream tasks
with extrinsic rewards. We find that CIC substantially improves over prior
unsupervised skill discovery methods and outperforms the next leading overall
exploration algorithm in terms of downstream task performance.
Related papers
- On the Convergence of Continual Learning with Adaptive Methods [4.351356718501137]
We propose an adaptive sequential method for non continual learning (NCCL)
We demonstrate that the proposed method improves the performance of continual learning existing methods for several image classification tasks.
arXiv Detail & Related papers (2024-04-08T14:28:27Z) - ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - Actor Prioritized Experience Replay [0.0]
Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error.
We introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER.
An extensive set of experiments verifies our theoretical claims and demonstrates that the introduced method significantly outperforms the competing approaches.
arXiv Detail & Related papers (2022-09-01T15:27:46Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized.
The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z) - Stochastic Hard Thresholding Algorithms for AUC Maximization [49.00683387735522]
We develop a hard thresholding algorithm for AUC in distributiond classification.
We conduct experiments to show the efficiency and effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-04T16:49:29Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.