Sample-efficient Unsupervised Policy Cloning from Ensemble Self-supervised Labeled Videos
- URL: http://arxiv.org/abs/2412.10778v2
- Date: Tue, 08 Apr 2025 08:54:33 GMT
- Title: Sample-efficient Unsupervised Policy Cloning from Ensemble Self-supervised Labeled Videos
- Authors: Xin Liu, Yaran Chen, Haoran Li,
- Abstract summary: Unsupervised Policy from Ensemble Self-supervised labeled Videos (SV) is a novel framework to efficiently learn policies from action-free videos without rewards and any other supervision.<n>SV trains a video labeling model to infer the expert actions in expert videos through several combined self-supervised tasks.<n>After a sample-efficient, unsupervised, and iterative training process, SV obtains an advanced policy based on a robust video labeling model.
- Score: 7.827978803804189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current advanced policy learning methodologies have demonstrated the ability to develop expert-level strategies when provided enough information. However, their requirements, including task-specific rewards, action-labeled expert trajectories, and huge environmental interactions, can be expensive or even unavailable in many scenarios. In contrast, humans can efficiently acquire skills within a few trials and errors by imitating easily accessible internet videos, in the absence of any other supervision. In this paper, we try to let machines replicate this efficient watching-and-learning process through Unsupervised Policy from Ensemble Self-supervised labeled Videos (UPESV), a novel framework to efficiently learn policies from action-free videos without rewards and any other expert supervision. UPESV trains a video labeling model to infer the expert actions in expert videos through several organically combined self-supervised tasks. Each task performs its duties, and they together enable the model to make full use of both action-free videos and reward-free interactions for robust dynamics understanding and advanced action prediction. Simultaneously, UPESV clones a policy from the labeled expert videos, in turn collecting environmental interactions for self-supervised tasks. After a sample-efficient, unsupervised, and iterative training process, UPESV obtains an advanced policy based on a robust video labeling model. Extensive experiments in sixteen challenging procedurally generated environments demonstrate that the proposed UPESV achieves state-of-the-art interaction-limited policy learning performance (outperforming five current advanced baselines on 12/16 tasks) without exposure to any other supervision except for videos.
Related papers
- Subtask-Aware Visual Reward Learning from Segmented Demonstrations [97.80917991633248]
This paper introduces REDS: REward learning from Demonstration with Demonstrations, a novel reward learning framework.
We train a dense reward function conditioned on video segments and their corresponding subtasks to ensure alignment with ground-truth reward signals.
Our experiments show that REDS significantly outperforms baseline methods on complex robotic manipulation tasks in Meta-World.
arXiv Detail & Related papers (2025-02-28T01:25:37Z) - PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement [16.768912344111946]
We present PROGRESSOR, a framework that learns a task-agnostic reward function from videos.<n>We show that PROGRESSOR enables robots to learn complex behaviors without any external supervision.
arXiv Detail & Related papers (2024-11-26T04:17:51Z) - Pre-trained Visual Dynamics Representations for Efficient Policy Learning [33.62440075940917]
We propose Pre-trained Visual Dynamics Representations (PVDR) to bridge the domain gap between videos and downstream tasks for efficient policy learning.
The pre-trained visual dynamics representations capture the visual dynamics prior knowledge in the videos.
This abstract prior knowledge can be readily adapted to downstream tasks and aligned with executable actions through online adaptation.
arXiv Detail & Related papers (2024-11-05T15:18:02Z) - Dreamitate: Real-World Visuomotor Policy Learning via Video Generation [49.03287909942888]
We propose a visuomotor policy learning framework that fine-tunes a video diffusion model on human demonstrations of a given task.
We generate an example of an execution of the task conditioned on images of a novel scene, and use this synthesized execution directly to control the robot.
arXiv Detail & Related papers (2024-06-24T17:59:45Z) - Multi-Agent Generative Adversarial Interactive Self-Imitation Learning
for AUV Formation Control and Obstacle Avoidance [10.834762022842353]
This paper builds upon the MAGAIL algorithm by proposing multi-agent generative adversarial interactive self-imitation learning (MAGAISIL)
Our experimental results in a multi-AUV formation control and obstacle avoidance task show that AUVs trained via MAGAISIL can surpass the provided sub-optimal expert demonstrations.
arXiv Detail & Related papers (2024-01-21T03:01:00Z) - DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent) [73.10899129264375]
This paper explores DoraemonGPT, a comprehensive and conceptually elegant system driven by LLMs to understand dynamic scenes.
Given a video with a question/task, DoraemonGPT begins by converting the input video into a symbolic memory that stores task-related attributes.
We extensively evaluate DoraemonGPT's effectiveness on three benchmarks and several in-the-wild scenarios.
arXiv Detail & Related papers (2024-01-16T14:33:09Z) - Learning to Act from Actionless Videos through Dense Correspondences [87.1243107115642]
We present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments.
Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals.
We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks.
arXiv Detail & Related papers (2023-10-12T17:59:23Z) - RoboCLIP: One Demonstration is Enough to Learn Robot Policies [72.24495908759967]
RoboCLIP is an online imitation learning method that uses a single demonstration in the form of a video demonstration or a textual description of the task to generate rewards.
RoboCLIP can also utilize out-of-domain demonstrations, like videos of humans solving the task for reward generation, circumventing the need to have the same demonstration and deployment domains.
arXiv Detail & Related papers (2023-10-11T21:10:21Z) - Domain-aware Self-supervised Pre-training for Label-Efficient Meme
Analysis [29.888546964947537]
We introduce two self-supervised pre-training methods for meme analysis.
First, we employ off-the-shelf multi-modal hate-speech data during pre-training.
Second, we perform self-supervised learning by incorporating multiple specialized pretext tasks.
arXiv Detail & Related papers (2022-09-29T10:00:29Z) - Action-Conditioned Contrastive Policy Pretraining [39.13710045468429]
Deep visuomotor policy learning achieves promising results in control tasks such as robotic manipulation and autonomous driving.
It requires a huge number of online interactions with the training environment, which limits its real-world application.
In this work, we aim to pretrain policy representations for driving tasks using hours-long uncurated YouTube videos.
arXiv Detail & Related papers (2022-04-05T17:58:22Z) - Imitation Learning by Estimating Expertise of Demonstrators [92.20185160311036]
We show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators.
We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess.
arXiv Detail & Related papers (2022-02-02T21:23:19Z) - Unsupervised Discovery of Actions in Instructional Videos [86.77350242461803]
We present an unsupervised approach to learn atomic actions of structured human tasks from a variety of instructional videos.
We propose a sequential autoregressive model for temporal segmentation of videos, which learns to represent and discover the sequential relationship between different atomic actions of the task.
Our approach outperforms the state-of-the-art unsupervised methods with large margins.
arXiv Detail & Related papers (2021-06-28T14:05:01Z) - MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale [103.7609761511652]
We show how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously.
New tasks can be continuously instantiated from previously learned tasks.
We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots.
arXiv Detail & Related papers (2021-04-16T16:38:02Z) - WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos [124.72839555467944]
We propose a weakly supervised framework that can be trained using only video-class labels.
We show that our method largely outperforms weakly-supervised baselines.
When strongly supervised, our method obtains the state-of-the-art results in the tasks of both online per-frame action recognition and online detection of action start.
arXiv Detail & Related papers (2020-06-05T23:08:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.