Human-Aligned Skill Discovery: Balancing Behaviour Exploration and Alignment
- URL: http://arxiv.org/abs/2501.17431v1
- Date: Wed, 29 Jan 2025 06:14:27 GMT
- Title: Human-Aligned Skill Discovery: Balancing Behaviour Exploration and Alignment
- Authors: Maxence Hussonnois, Thommen George Karimpanal, Santu Rana,
- Abstract summary: We propose Human-aligned Skill Discovery (HaSD) to discover safer, more aligned skills.
HaSD simultaneously optimises skill diversity and alignment with human values.
We demonstrate its effectiveness in both 2D navigation and SafetyGymnasium environments.
- Score: 14.948610521764415
- License:
- Abstract: Unsupervised skill discovery in Reinforcement Learning aims to mimic humans' ability to autonomously discover diverse behaviors. However, existing methods are often unconstrained, making it difficult to find useful skills, especially in complex environments, where discovered skills are frequently unsafe or impractical. We address this issue by proposing Human-aligned Skill Discovery (HaSD), a framework that incorporates human feedback to discover safer, more aligned skills. HaSD simultaneously optimises skill diversity and alignment with human values. This approach ensures that alignment is maintained throughout the skill discovery process, eliminating the inefficiencies associated with exploring unaligned skills. We demonstrate its effectiveness in both 2D navigation and SafetyGymnasium environments, showing that HaSD discovers diverse, human-aligned skills that are safe and useful for downstream tasks. Finally, we extend HaSD by learning a range of configurable skills with varying degrees of diversity alignment trade-offs that could be useful in practical scenarios.
Related papers
- SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions [48.003320766433966]
This work introduces Skill Discovery from Local Dependencies (Skild)
Skild develops a novel skill learning objective that explicitly encourages the mastering of skills that induce different interactions within an environment.
We evaluate Skild in several domains with challenging, long-horizon sparse reward tasks including a realistic simulated household robot domain.
arXiv Detail & Related papers (2024-10-24T04:01:59Z) - Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning [39.991887534269445]
Disentangled Unsupervised Skill Discovery (DUSDi) is a method for learning disentangled skills that can be efficiently reused to solve downstream tasks.
DUSDi decomposes skills into disentangled components, where each skill component only affects one factor of the state space.
DUSDi successfully learns disentangled skills, and significantly outperforms previous skill discovery methods when it comes to applying the learned skills to solve downstream tasks.
arXiv Detail & Related papers (2024-10-15T04:13:20Z) - Language Guided Skill Discovery [56.84356022198222]
We introduce Language Guided Skill Discovery (LGSD) to maximize semantic diversity between skills.
LGSD takes user prompts as input and outputs a set of semantically distinctive skills.
We demonstrate that LGSD enables legged robots to visit different user-intended areas on a plane by simply changing the prompt.
arXiv Detail & Related papers (2024-06-07T04:25:38Z) - Behavior Contrastive Learning for Unsupervised Skill Discovery [75.6190748711826]
We propose a novel unsupervised skill discovery method through contrastive learning among behaviors.
Under mild assumptions, our objective maximizes the MI between different behaviors based on the same skill.
Our method implicitly increases the state entropy to obtain better state coverage.
arXiv Detail & Related papers (2023-05-08T06:02:11Z) - Controlled Diversity with Preference : Towards Learning a Diverse Set of
Desired Skills [15.187171070594935]
We propose Controlled Diversity with Preference (CDP), a collaborative human-guided mechanism for an agent to learn a set of skills that is diverse as well as desirable.
The key principle is to restrict the discovery of skills to those regions that are deemed to be desirable as per a preference model trained using human preference labels on trajectory pairs.
We evaluate our approach on 2D navigation and Mujoco environments and demonstrate the ability to discover diverse, yet desirable skills.
arXiv Detail & Related papers (2023-03-07T03:37:47Z) - Controllability-Aware Unsupervised Skill Discovery [94.19932297743439]
We introduce a novel unsupervised skill discovery method, Controllability-aware Skill Discovery (CSD), which actively seeks complex, hard-to-control skills without supervision.
The key component of CSD is a controllability-aware distance function, which assigns larger values to state transitions that are harder to achieve with the current skills.
Our experimental results in six robotic manipulation and locomotion environments demonstrate that CSD can discover diverse complex skills with no supervision.
arXiv Detail & Related papers (2023-02-10T08:03:09Z) - Lipschitz-constrained Unsupervised Skill Discovery [91.51219447057817]
Lipschitz-constrained Skill Discovery (LSD) encourages the agent to discover more diverse, dynamic, and far-reaching skills.
LSD outperforms previous approaches in terms of skill diversity, state space coverage, and performance on seven downstream tasks.
arXiv Detail & Related papers (2022-02-02T08:29:04Z) - Discovering Generalizable Skills via Automated Generation of Diverse
Tasks [82.16392072211337]
We propose a method to discover generalizable skills via automated generation of a diverse set of tasks.
As opposed to prior work on unsupervised discovery of skills, our method pairs each skill with a unique task produced by a trainable task generator.
A task discriminator defined on the robot behaviors in the generated tasks is jointly trained to estimate the evidence lower bound of the diversity objective.
The learned skills can then be composed in a hierarchical reinforcement learning algorithm to solve unseen target tasks.
arXiv Detail & Related papers (2021-06-26T03:41:51Z) - Relative Variational Intrinsic Control [11.328970848714919]
Relative Variational Intrinsic Control (RVIC) incentivizes learning skills that are distinguishable in how they change the agent's relationship to its environment.
We show how RVIC skills are more useful than skills discovered by existing methods when used in hierarchical reinforcement learning.
arXiv Detail & Related papers (2020-12-14T18:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.