Sample-efficient Unsupervised Policy Cloning from Ensemble Self-supervised Labeled Videos
- URL: http://arxiv.org/abs/2412.10778v1
- Date: Sat, 14 Dec 2024 10:12:22 GMT
- Title: Sample-efficient Unsupervised Policy Cloning from Ensemble Self-supervised Labeled Videos
- Authors: Xin Liu, Yaran Chen,
- Abstract summary: Current advanced policy learning methodologies have demonstrated the ability to develop expert-level strategies when provided enough information.
Humans can efficiently acquire skills within a few trials and errors by imitating easily accessible internet video, in the absence of any other supervision.
In this paper, we try to let machines replicate this efficient watching-and-learning process through Unsupervised Policy from Ensemble Self-supervised labeled Videos.
- Score: 4.6949816706255065
- License:
- Abstract: Current advanced policy learning methodologies have demonstrated the ability to develop expert-level strategies when provided enough information. However, their requirements, including task-specific rewards, expert-labeled trajectories, and huge environmental interactions, can be expensive or even unavailable in many scenarios. In contrast, humans can efficiently acquire skills within a few trials and errors by imitating easily accessible internet video, in the absence of any other supervision. In this paper, we try to let machines replicate this efficient watching-and-learning process through Unsupervised Policy from Ensemble Self-supervised labeled Videos (UPESV), a novel framework to efficiently learn policies from videos without any other expert supervision. UPESV trains a video labeling model to infer the expert actions in expert videos, through several organically combined self-supervised tasks. Each task performs its own duties, and they together enable the model to make full use of both expert videos and reward-free interactions for advanced dynamics understanding and robust prediction. Simultaneously, UPESV clones a policy from the labeled expert videos, in turn collecting environmental interactions for self-supervised tasks. After a sample-efficient and unsupervised (i.e., reward-free) training process, an advanced video-imitated policy is obtained. Extensive experiments in sixteen challenging procedurally-generated environments demonstrate that the proposed UPESV achieves state-of-the-art few-shot policy learning (outperforming five current advanced baselines on 12/16 tasks) without exposure to any other supervision except videos. Detailed analysis is also provided, verifying the necessity of each self-supervised task employed in UPESV.
Related papers
- PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement [16.768912344111946]
We present PROGRESSOR, a framework that learns a task-agnostic reward function from videos.
We show that PROGRESSOR enables robots to learn complex behaviors without any external supervision.
arXiv Detail & Related papers (2024-11-26T04:17:51Z) - Improving Generalization in Visual Reasoning via Self-Ensemble [0.0]
We propose self-ensemble, a novel method that improves the generalization and visual reasoning of the model without updating any parameters.
Our key insight is that LVLM itself can ensemble without the need for any other LVLMs, which helps to unlock their internal capabilities.
arXiv Detail & Related papers (2024-10-28T10:04:40Z) - ExpertAF: Expert Actionable Feedback from Video [81.46431188306397]
Current methods for skill-assessment from video only provide scores or compare demonstrations.
We introduce a novel method to generate actionable feedback from video of a person doing a physical activity.
Our method is able to reason across multi-modal input combinations to output full-spectrum, actionable coaching.
arXiv Detail & Related papers (2024-08-01T16:13:07Z) - Multi-Agent Generative Adversarial Interactive Self-Imitation Learning
for AUV Formation Control and Obstacle Avoidance [10.834762022842353]
This paper builds upon the MAGAIL algorithm by proposing multi-agent generative adversarial interactive self-imitation learning (MAGAISIL)
Our experimental results in a multi-AUV formation control and obstacle avoidance task show that AUVs trained via MAGAISIL can surpass the provided sub-optimal expert demonstrations.
arXiv Detail & Related papers (2024-01-21T03:01:00Z) - Learning to Act from Actionless Videos through Dense Correspondences [87.1243107115642]
We present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments.
Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals.
We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks.
arXiv Detail & Related papers (2023-10-12T17:59:23Z) - RoboCLIP: One Demonstration is Enough to Learn Robot Policies [72.24495908759967]
RoboCLIP is an online imitation learning method that uses a single demonstration in the form of a video demonstration or a textual description of the task to generate rewards.
RoboCLIP can also utilize out-of-domain demonstrations, like videos of humans solving the task for reward generation, circumventing the need to have the same demonstration and deployment domains.
arXiv Detail & Related papers (2023-10-11T21:10:21Z) - Domain-aware Self-supervised Pre-training for Label-Efficient Meme
Analysis [29.888546964947537]
We introduce two self-supervised pre-training methods for meme analysis.
First, we employ off-the-shelf multi-modal hate-speech data during pre-training.
Second, we perform self-supervised learning by incorporating multiple specialized pretext tasks.
arXiv Detail & Related papers (2022-09-29T10:00:29Z) - Imitation Learning by Estimating Expertise of Demonstrators [92.20185160311036]
We show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators.
We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess.
arXiv Detail & Related papers (2022-02-02T21:23:19Z) - Unsupervised Discovery of Actions in Instructional Videos [86.77350242461803]
We present an unsupervised approach to learn atomic actions of structured human tasks from a variety of instructional videos.
We propose a sequential autoregressive model for temporal segmentation of videos, which learns to represent and discover the sequential relationship between different atomic actions of the task.
Our approach outperforms the state-of-the-art unsupervised methods with large margins.
arXiv Detail & Related papers (2021-06-28T14:05:01Z) - MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale [103.7609761511652]
We show how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously.
New tasks can be continuously instantiated from previously learned tasks.
We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots.
arXiv Detail & Related papers (2021-04-16T16:38:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.