Unsupervised Skill Discovery through Skill Regions Differentiation
- URL: http://arxiv.org/abs/2506.14420v1
- Date: Tue, 17 Jun 2025 11:30:04 GMT
- Title: Unsupervised Skill Discovery through Skill Regions Differentiation
- Authors: Ting Xiao, Jiakun Zheng, Rushuai Yang, Kang Xu, Qiaosheng Zhang, Peng Liu, Chenjia Bai,
- Abstract summary: Unsupervised Reinforcement Learning (RL) aims to discover diverse behaviors that can accelerate the learning of downstream tasks.<n>We propose a novel skill discovery objective that maximizes the deviation of the state density of one skill from the explored regions of other skills.<n>We also formulate an intrinsic reward based on the learned autoencoder that resembles count-based exploration in a compact latent space.
- Score: 6.088346462603191
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised Reinforcement Learning (RL) aims to discover diverse behaviors that can accelerate the learning of downstream tasks. Previous methods typically focus on entropy-based exploration or empowerment-driven skill learning. However, entropy-based exploration struggles in large-scale state spaces (e.g., images), and empowerment-based methods with Mutual Information (MI) estimations have limitations in state exploration. To address these challenges, we propose a novel skill discovery objective that maximizes the deviation of the state density of one skill from the explored regions of other skills, encouraging inter-skill state diversity similar to the initial MI objective. For state-density estimation, we construct a novel conditional autoencoder with soft modularization for different skill policies in high-dimensional space. Meanwhile, to incentivize intra-skill exploration, we formulate an intrinsic reward based on the learned autoencoder that resembles count-based exploration in a compact latent space. Through extensive experiments in challenging state and image-based tasks, we find our method learns meaningful skills and achieves superior performance in various downstream tasks.
Related papers
- Efficient Skill Discovery via Regret-Aware Optimization [37.27136009415794]
We frame skill discovery as a min-max game of skill generation and policy learning.<n>We propose a regret-aware method on top of temporal representation learning.<n>Our method achieves a 15% zero shot improvement in high-dimensional environments.
arXiv Detail & Related papers (2025-06-26T06:45:59Z) - SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions [48.003320766433966]
This work introduces Skill Discovery from Local Dependencies (Skild)
Skild develops a novel skill learning objective that explicitly encourages the mastering of skills that induce different interactions within an environment.
We evaluate Skild in several domains with challenging, long-horizon sparse reward tasks including a realistic simulated household robot domain.
arXiv Detail & Related papers (2024-10-24T04:01:59Z) - Constrained Ensemble Exploration for Unsupervised Skill Discovery [43.00837365639085]
Unsupervised Reinforcement Learning (RL) provides a promising paradigm for learning useful behaviors via reward-free per-training.
We propose a novel unsupervised RL framework via an ensemble of skills, where each skill performs partition exploration based on the state prototypes.
We find our method learns well-explored ensemble skills and achieves superior performance in various downstream tasks compared to previous methods.
arXiv Detail & Related papers (2024-05-25T03:07:56Z) - ComSD: Balancing Behavioral Quality and Diversity in Unsupervised Skill Discovery [12.277005054008017]
We propose textbfContrastive dynatextbfmic textbfSkill textbfDiscovery textbf(ComSD).<n>ComSD generates diverse and exploratory unsupervised skills through a novel intrinsic incentive, named contrastive dynamic reward.<n>It can also discover distinguishable and far-reaching exploration skills in the challenging tree-like 2D maze.
arXiv Detail & Related papers (2023-09-29T12:53:41Z) - Behavior Contrastive Learning for Unsupervised Skill Discovery [75.6190748711826]
We propose a novel unsupervised skill discovery method through contrastive learning among behaviors.
Under mild assumptions, our objective maximizes the MI between different behaviors based on the same skill.
Our method implicitly increases the state entropy to obtain better state coverage.
arXiv Detail & Related papers (2023-05-08T06:02:11Z) - Learning Options via Compression [62.55893046218824]
We propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills.
Our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood.
arXiv Detail & Related papers (2022-12-08T22:34:59Z) - Hierarchical Skills for Efficient Exploration [70.62309286348057]
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration.
Prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design.
We propose a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner.
arXiv Detail & Related papers (2021-10-20T22:29:32Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Explore, Discover and Learn: Unsupervised Discovery of State-Covering
Skills [155.11646755470582]
'Explore, Discover and Learn' (EDL) is an alternative approach to information-theoretic skill discovery.
We show that EDL offers significant advantages, such as overcoming the coverage problem, reducing the dependence of learned skills on the initial state, and allowing the user to define a prior over which behaviors should be learned.
arXiv Detail & Related papers (2020-02-10T10:49:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.