Related papers: Offline Diversity Maximization Under Imitation Constraints

Offline Diversity Maximization Under Imitation Constraints

URL: http://arxiv.org/abs/2307.11373v3
Date: Fri, 21 Jun 2024 16:59:57 GMT
Title: Offline Diversity Maximization Under Imitation Constraints
Authors: Marin Vlastelica, Jin Cheng, Georg Martius, Pavel Kolev,
Abstract summary: We propose a principled offline algorithm for unsupervised skill discovery. Our main analytical contribution is to connect Fenchel duality, reinforcement learning, and unsupervised skill discovery. We demonstrate the effectiveness of our method on the standard offline benchmark D4RL.
Score: 23.761620064055897
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There has been significant recent progress in the area of unsupervised skill discovery, utilizing various information-theoretic objectives as measures of diversity. Despite these advances, challenges remain: current methods require significant online interaction, fail to leverage vast amounts of available task-agnostic data and typically lack a quantitative measure of skill utility. We address these challenges by proposing a principled offline algorithm for unsupervised skill discovery that, in addition to maximizing diversity, ensures that each learned skill imitates state-only expert demonstrations to a certain degree. Our main analytical contribution is to connect Fenchel duality, reinforcement learning, and unsupervised skill discovery to maximize a mutual information objective subject to KL-divergence state occupancy constraints. Furthermore, we demonstrate the effectiveness of our method on the standard offline benchmark D4RL and on a custom offline dataset collected from a 12-DoF quadruped robot for which the policies trained in simulation transfer well to the real robotic system.

Related papers

SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations [68.9300049150948]
We address a fundamental challenge in Reinforcement Learning from Interaction Demonstration (RLID)<n>Existing data collection approaches yield sparse, disconnected, and noisy trajectories that fail to capture the full spectrum of possible skill variations and transitions.<n>We present two data augmentation techniques: a Stitched Trajectory Graph (STG) that discovers potential transitions between demonstration skills, and a State Transition Field (STF) that establishes unique connections for arbitrary states within the demonstration neighborhood.
arXiv Detail & Related papers (2025-05-04T13:00:29Z)
Dual-Force: Enhanced Offline Diversity Maximization under Imitation Constraints [24.544586300399843]
In this work, we present a novel offline algorithm that enhances diversity using an objective based on Van der Waals (VdW) force algorithms. Our algorithm provides a zero-shot recall of all skills encountered during training, significantly expanding the set of skills learned in prior work.
arXiv Detail & Related papers (2025-01-08T11:20:48Z)
Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods. MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections. Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z)
Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations. We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games. Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z)
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models. It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
Robust Policy Learning via Offline Skill Diffusion [6.876580618014666]
We present a novel offline skill learning framework, DuSkill. DuSkill employs a guided Diffusion model to generate versatile skills extended from the limited skills in datasets. We show that DuSkill outperforms other skill-based imitation learning and RL algorithms for several long-horizon tasks.
arXiv Detail & Related papers (2024-03-01T02:00:44Z)
CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning [62.58375643251612]
We propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection. With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity. Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.
arXiv Detail & Related papers (2023-12-19T14:26:23Z)
A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning [25.123237633748193]
offline-to-online reinforcement learning can be challenging due to constrained exploratory behavior and state-action distribution shift. We propose a Simple Unified uNcertainty-Guided (SUNG) framework, which unifies the solution to both challenges with the tool of uncertainty. SUNG achieves state-of-the-art online finetuning performance when combined with different offline RL methods.
arXiv Detail & Related papers (2023-06-13T05:22:26Z)
Self-QA: Unsupervised Knowledge Guided Language Model Alignment [17.436587487811387]
We introduce Self-QA, which replaces the traditional practice of human-written instruction seeds with a vast amount of unsupervised knowledge. The effectiveness of our proposed method is demonstrated through experiments conducted on unsupervised corpora from various domains.
arXiv Detail & Related papers (2023-05-19T18:26:26Z)
Unsupervised Self-Driving Attention Prediction via Uncertainty Mining and Knowledge Embedding [51.8579160500354]
We propose an unsupervised way to predict self-driving attention by uncertainty modeling and driving knowledge integration. Results show equivalent or even more impressive performance compared to fully-supervised state-of-the-art approaches.
arXiv Detail & Related papers (2023-03-17T00:28:33Z)
Learning Options via Compression [62.55893046218824]
We propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills. Our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood.
arXiv Detail & Related papers (2022-12-08T22:34:59Z)
Domain-aware Self-supervised Pre-training for Label-Efficient Meme Analysis [29.888546964947537]
We introduce two self-supervised pre-training methods for meme analysis. First, we employ off-the-shelf multi-modal hate-speech data during pre-training. Second, we perform self-supervised learning by incorporating multiple specialized pretext tasks.
arXiv Detail & Related papers (2022-09-29T10:00:29Z)
Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders. We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z)
Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions [19.626042478612572]
We propose a cooperative adversarial method for obtaining versatile policies with controllable skill sets from unlabeled datasets. We show that by utilizing unsupervised skill discovery in the generative imitation learning framework, novel and useful skills emerge with successful task fulfillment. Finally, the obtained versatile policies are tested on an agile quadruped robot called Solo 8 and present faithful replications of diverse skills encoded in the demonstrations.
arXiv Detail & Related papers (2022-09-16T12:49:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.