Related papers: Learning Novel Skills from Language-Generated Demonstrations

Learning Novel Skills from Language-Generated Demonstrations

URL: http://arxiv.org/abs/2412.09286v2
Date: Wed, 21 May 2025 03:15:25 GMT
Title: Learning Novel Skills from Language-Generated Demonstrations
Authors: Ao-Qun Jin, Tian-Yu Xiang, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Yue Cao, Sheng-Bin Duan, Fu-Chao Xie, Zeng-Guang Hou,
Abstract summary: DemoGen is a skill-learning framework that enables robots to acquire novel skills from natural language instructions.<n>It generates demonstration videos of novel skills, which enabling robots to learn new skills effectively.<n>Using the generated demonstrations, various skill learning algorithms achieve an accomplishment rate three times the original on novel tasks.
Score: 15.495784871963451
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robots are increasingly deployed across diverse domains to tackle tasks requiring novel skills. However, current robot learning algorithms for acquiring novel skills often rely on demonstration datasets or environment interactions, resulting in high labor costs and potential safety risks. To address these challenges, this study proposes DemoGen, a skill-learning framework that enables robots to acquire novel skills from natural language instructions. DemoGen leverages the vision-language model and the video diffusion model to generate demonstration videos of novel skills, which enabling robots to learn new skills effectively. Experimental evaluations in the MetaWorld simulation environments demonstrate the pipeline's capability to generate high-fidelity and reliable demonstrations. Using the generated demonstrations, various skill learning algorithms achieve an accomplishment rate three times the original on novel tasks. These results highlight a novel approach to robot learning, offering a foundation for the intuitive and intelligent acquisition of novel robotic skills. (Project website: https://aoqunjin.github.io/LNSLGD/)

Related papers

Neuro-Symbolic Imitation Learning: Discovering Symbolic Abstractions for Skill Learning [15.26375359103084]
This paper proposes a neuro-symbolic imitation learning framework. It learns a symbolic representation that abstracts the low-level state-action space. The learned representation decomposes a task into easier subtasks and allows the system to leverage symbolic planning.
arXiv Detail & Related papers (2025-03-27T11:50:29Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
Continual Skill and Task Learning via Dialogue [3.3511259017219297]
Continual and interactive robot learning is a challenging problem as the robot is present with human users. We present a framework for robots to query and learn visuo-motor robot skills and task relevant information via natural language dialog interactions with human users.
arXiv Detail & Related papers (2024-09-05T01:51:54Z)
VITAL: Visual Teleoperation to Enhance Robot Learning through Human-in-the-Loop Corrections [10.49712834719005]
We propose a low-cost visual teleoperation system for bimanual manipulation tasks, called VITAL. Our approach leverages affordable hardware and visual processing techniques to collect demonstrations. We enhance the generalizability and robustness of the learned policies by utilizing both real and simulated environments.
arXiv Detail & Related papers (2024-07-30T23:29:47Z)
DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model [72.66465487508556]
DiffGen is a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model. It can generate realistic robot demonstrations by minimizing the distance between the embedding of the language instruction and the embedding of the simulated observation. Experiments demonstrate that with DiffGen, we could efficiently and effectively generate robot data with minimal human effort or training time.
arXiv Detail & Related papers (2024-05-12T15:38:17Z)
LOTUS: Continual Imitation Learning for Robot Manipulation Through Unsupervised Skill Discovery [29.774700960178624]
We introduce LOTUS, a continual imitation learning algorithm that empowers a physical robot to continuously and efficiently learn to solve new manipulation tasks. Continual skill discovery updates existing skills to avoid forgetting previous tasks and adds new skills to solve novel tasks. Our comprehensive experiments show that LOTUS outperforms state-of-the-art baselines by over 11% in success rate.
arXiv Detail & Related papers (2023-11-03T17:38:35Z)
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [68.70755196744533]
RoboGen is a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics.
arXiv Detail & Related papers (2023-11-02T17:59:21Z)
XSkill: Cross Embodiment Skill Discovery [41.624343257852146]
XSkill is an imitation learning framework that discovers a cross-embodiment representation called skill prototypes purely from unlabeled human and robot manipulation videos. Our experiments in simulation and real-world environments show that the discovered skill prototypes facilitate skill transfer and composition for unseen tasks.
arXiv Detail & Related papers (2023-07-19T12:51:28Z)
RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots. Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations. This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv Detail & Related papers (2023-07-02T15:33:31Z)
Surfer: Progressive Reasoning with World Models for Robotic Manipulation [51.26109827779267]
We introduce a novel and simple robot manipulation framework, called Surfer. Surfer treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene. It is based on the world model, treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene.
arXiv Detail & Related papers (2023-06-20T07:06:04Z)
What Matters in Language Conditioned Robotic Imitation Learning [26.92329260907805]
We study the most critical challenges in learning language conditioned policies from offline free-form imitation datasets. We present a novel approach that significantly outperforms the state of the art on the challenging language conditioned long-horizon robot manipulation CALVIN benchmark.
arXiv Detail & Related papers (2022-04-13T08:45:32Z)
Summarizing a virtual robot's past actions in natural language [0.3553493344868413]
We show how a popular dataset that matches robot actions with natural language descriptions designed for an instruction following task can be repurposed to serve as a training ground for robot action summarization work. We propose and test several methods of learning to generate such summaries, starting from either egocentric video frames of the robot taking actions or intermediate text representations of the actions used by an automatic planner.
arXiv Detail & Related papers (2022-03-13T15:00:46Z)
Continual Learning from Demonstration of Robotics Skills [5.573543601558405]
Methods for teaching motion skills to robots focus on training for a single skill at a time. We propose an approach for continual learning from demonstration using hypernetworks and neural ordinary differential equation solvers.
arXiv Detail & Related papers (2022-02-14T16:26:52Z)
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning [108.41464483878683]
We study the problem of enabling a vision-based robotic manipulation system to generalize to novel tasks. We develop an interactive and flexible imitation learning system that can learn from both demonstrations and interventions. When scaling data collection on a real robot to more than 100 distinct tasks, we find that this system can perform 24 unseen manipulation tasks with an average success rate of 44%.
arXiv Detail & Related papers (2022-02-04T07:30:48Z)
Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation [55.31301153979621]
We tackle real-world long-horizon robot manipulation tasks through skill discovery. We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations. Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks.
arXiv Detail & Related papers (2021-09-28T16:18:54Z)
CRIL: Continual Robot Imitation Learning via Generative and Prediction Model [8.896427780114703]
We study how to realize continual imitation learning ability that empowers robots to continually learn new tasks one by one. We propose a novel trajectory generation model that employs both a generative adversarial network and a dynamics prediction model. Our experiments on both simulation and real world manipulation tasks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2021-06-17T12:15:57Z)
What Can I Do Here? Learning New Skills by Imagining Visual Affordances [128.65223577406587]
We show how generative models of possible outcomes can allow a robot to learn visual representations of affordances. In effect, prior data is used to learn what kinds of outcomes may be possible, such that when the robot encounters an unfamiliar setting, it can sample potential outcomes from its model. We show that visuomotor affordance learning (VAL) can be used to train goal-conditioned policies that operate on raw image inputs.
arXiv Detail & Related papers (2021-06-01T17:58:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.