Controlled Diversity with Preference : Towards Learning a Diverse Set of
Desired Skills
- URL: http://arxiv.org/abs/2303.04592v1
- Date: Tue, 7 Mar 2023 03:37:47 GMT
- Title: Controlled Diversity with Preference : Towards Learning a Diverse Set of
Desired Skills
- Authors: Maxence Hussonnois, Thommen George Karimpanal and Santu Rana
- Abstract summary: We propose Controlled Diversity with Preference (CDP), a collaborative human-guided mechanism for an agent to learn a set of skills that is diverse as well as desirable.
The key principle is to restrict the discovery of skills to those regions that are deemed to be desirable as per a preference model trained using human preference labels on trajectory pairs.
We evaluate our approach on 2D navigation and Mujoco environments and demonstrate the ability to discover diverse, yet desirable skills.
- Score: 15.187171070594935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomously learning diverse behaviors without an extrinsic reward signal
has been a problem of interest in reinforcement learning. However, the nature
of learning in such mechanisms is unconstrained, often resulting in the
accumulation of several unusable, unsafe or misaligned skills. In order to
avoid such issues and ensure the discovery of safe and human-aligned skills, it
is necessary to incorporate humans into the unsupervised training process,
which remains a largely unexplored research area. In this work, we propose
Controlled Diversity with Preference (CDP), a novel, collaborative human-guided
mechanism for an agent to learn a set of skills that is diverse as well as
desirable. The key principle is to restrict the discovery of skills to those
regions that are deemed to be desirable as per a preference model trained using
human preference labels on trajectory pairs. We evaluate our approach on 2D
navigation and Mujoco environments and demonstrate the ability to discover
diverse, yet desirable skills.
Related papers
- SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions [48.003320766433966]
This work introduces Skill Discovery from Local Dependencies (Skild)
Skild develops a novel skill learning objective that explicitly encourages the mastering of skills that induce different interactions within an environment.
We evaluate Skild in several domains with challenging, long-horizon sparse reward tasks including a realistic simulated household robot domain.
arXiv Detail & Related papers (2024-10-24T04:01:59Z) - SLIM: Skill Learning with Multiple Critics [8.645929825516818]
Self-supervised skill learning aims to acquire useful behaviors that leverage the underlying dynamics of the environment.
Latent variable models, based on mutual information, have been successful in this task but still struggle in the context of robotic manipulation.
We introduce SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation.
arXiv Detail & Related papers (2024-02-01T18:07:33Z) - Balancing Both Behavioral Quality and Diversity in Unsupervised Skill Discovery [12.277005054008017]
This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
We propose textbfContrastive textbfmulti-objective textbfSkill textbfDiscovery textbf(ComSD) which discovers exploratory and diverse behaviors through a novel intrinsic incentive, named contrastive multi-objective reward.
arXiv Detail & Related papers (2023-09-29T12:53:41Z) - Behavior Contrastive Learning for Unsupervised Skill Discovery [75.6190748711826]
We propose a novel unsupervised skill discovery method through contrastive learning among behaviors.
Under mild assumptions, our objective maximizes the MI between different behaviors based on the same skill.
Our method implicitly increases the state entropy to obtain better state coverage.
arXiv Detail & Related papers (2023-05-08T06:02:11Z) - Controllability-Aware Unsupervised Skill Discovery [94.19932297743439]
We introduce a novel unsupervised skill discovery method, Controllability-aware Skill Discovery (CSD), which actively seeks complex, hard-to-control skills without supervision.
The key component of CSD is a controllability-aware distance function, which assigns larger values to state transitions that are harder to achieve with the current skills.
Our experimental results in six robotic manipulation and locomotion environments demonstrate that CSD can discover diverse complex skills with no supervision.
arXiv Detail & Related papers (2023-02-10T08:03:09Z) - Versatile Skill Control via Self-supervised Adversarial Imitation of
Unlabeled Mixed Motions [19.626042478612572]
We propose a cooperative adversarial method for obtaining versatile policies with controllable skill sets from unlabeled datasets.
We show that by utilizing unsupervised skill discovery in the generative imitation learning framework, novel and useful skills emerge with successful task fulfillment.
Finally, the obtained versatile policies are tested on an agile quadruped robot called Solo 8 and present faithful replications of diverse skills encoded in the demonstrations.
arXiv Detail & Related papers (2022-09-16T12:49:04Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Discovering Generalizable Skills via Automated Generation of Diverse
Tasks [82.16392072211337]
We propose a method to discover generalizable skills via automated generation of a diverse set of tasks.
As opposed to prior work on unsupervised discovery of skills, our method pairs each skill with a unique task produced by a trainable task generator.
A task discriminator defined on the robot behaviors in the generated tasks is jointly trained to estimate the evidence lower bound of the diversity objective.
The learned skills can then be composed in a hierarchical reinforcement learning algorithm to solve unseen target tasks.
arXiv Detail & Related papers (2021-06-26T03:41:51Z) - Relative Variational Intrinsic Control [11.328970848714919]
Relative Variational Intrinsic Control (RVIC) incentivizes learning skills that are distinguishable in how they change the agent's relationship to its environment.
We show how RVIC skills are more useful than skills discovered by existing methods when used in hierarchical reinforcement learning.
arXiv Detail & Related papers (2020-12-14T18:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.