Offline Diversity Maximization Under Imitation Constraints
- URL: http://arxiv.org/abs/2307.11373v3
- Date: Fri, 21 Jun 2024 16:59:57 GMT
- Title: Offline Diversity Maximization Under Imitation Constraints
- Authors: Marin Vlastelica, Jin Cheng, Georg Martius, Pavel Kolev,
- Abstract summary: We propose a principled offline algorithm for unsupervised skill discovery.
Our main analytical contribution is to connect Fenchel duality, reinforcement learning, and unsupervised skill discovery.
We demonstrate the effectiveness of our method on the standard offline benchmark D4RL.
- Score: 23.761620064055897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been significant recent progress in the area of unsupervised skill discovery, utilizing various information-theoretic objectives as measures of diversity. Despite these advances, challenges remain: current methods require significant online interaction, fail to leverage vast amounts of available task-agnostic data and typically lack a quantitative measure of skill utility. We address these challenges by proposing a principled offline algorithm for unsupervised skill discovery that, in addition to maximizing diversity, ensures that each learned skill imitates state-only expert demonstrations to a certain degree. Our main analytical contribution is to connect Fenchel duality, reinforcement learning, and unsupervised skill discovery to maximize a mutual information objective subject to KL-divergence state occupancy constraints. Furthermore, we demonstrate the effectiveness of our method on the standard offline benchmark D4RL and on a custom offline dataset collected from a 12-DoF quadruped robot for which the policies trained in simulation transfer well to the real robotic system.
Related papers
- Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations.
We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games.
Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z) - Robust Policy Learning via Offline Skill Diffusion [6.876580618014666]
We present a novel offline skill learning framework, DuSkill.
DuSkill employs a guided Diffusion model to generate versatile skills extended from the limited skills in datasets.
We show that DuSkill outperforms other skill-based imitation learning and RL algorithms for several long-horizon tasks.
arXiv Detail & Related papers (2024-03-01T02:00:44Z) - CUDC: A Curiosity-Driven Unsupervised Data Collection Method with
Adaptive Temporal Distances for Offline Reinforcement Learning [62.58375643251612]
We propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection.
With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity.
Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.
arXiv Detail & Related papers (2023-12-19T14:26:23Z) - A Simple Unified Uncertainty-Guided Framework for Offline-to-Online
Reinforcement Learning [25.123237633748193]
offline-to-online reinforcement learning can be challenging due to constrained exploratory behavior and state-action distribution shift.
We propose a Simple Unified uNcertainty-Guided (SUNG) framework, which unifies the solution to both challenges with the tool of uncertainty.
SUNG achieves state-of-the-art online finetuning performance when combined with different offline RL methods.
arXiv Detail & Related papers (2023-06-13T05:22:26Z) - Self-QA: Unsupervised Knowledge Guided Language Model Alignment [17.436587487811387]
We introduce Self-QA, which replaces the traditional practice of human-written instruction seeds with a vast amount of unsupervised knowledge.
The effectiveness of our proposed method is demonstrated through experiments conducted on unsupervised corpora from various domains.
arXiv Detail & Related papers (2023-05-19T18:26:26Z) - Unsupervised Self-Driving Attention Prediction via Uncertainty Mining
and Knowledge Embedding [51.8579160500354]
We propose an unsupervised way to predict self-driving attention by uncertainty modeling and driving knowledge integration.
Results show equivalent or even more impressive performance compared to fully-supervised state-of-the-art approaches.
arXiv Detail & Related papers (2023-03-17T00:28:33Z) - Learning Options via Compression [62.55893046218824]
We propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills.
Our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood.
arXiv Detail & Related papers (2022-12-08T22:34:59Z) - Domain-aware Self-supervised Pre-training for Label-Efficient Meme
Analysis [29.888546964947537]
We introduce two self-supervised pre-training methods for meme analysis.
First, we employ off-the-shelf multi-modal hate-speech data during pre-training.
Second, we perform self-supervised learning by incorporating multiple specialized pretext tasks.
arXiv Detail & Related papers (2022-09-29T10:00:29Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - Versatile Skill Control via Self-supervised Adversarial Imitation of
Unlabeled Mixed Motions [19.626042478612572]
We propose a cooperative adversarial method for obtaining versatile policies with controllable skill sets from unlabeled datasets.
We show that by utilizing unsupervised skill discovery in the generative imitation learning framework, novel and useful skills emerge with successful task fulfillment.
Finally, the obtained versatile policies are tested on an agile quadruped robot called Solo 8 and present faithful replications of diverse skills encoded in the demonstrations.
arXiv Detail & Related papers (2022-09-16T12:49:04Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.