Learning Novel Skills from Language-Generated Demonstrations
        - URL: http://arxiv.org/abs/2412.09286v2
 - Date: Wed, 21 May 2025 03:15:25 GMT
 - Title: Learning Novel Skills from Language-Generated Demonstrations
 - Authors: Ao-Qun Jin, Tian-Yu Xiang, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Yue Cao, Sheng-Bin Duan, Fu-Chao Xie, Zeng-Guang Hou, 
 - Abstract summary: DemoGen is a skill-learning framework that enables robots to acquire novel skills from natural language instructions.<n>It generates demonstration videos of novel skills, which enabling robots to learn new skills effectively.<n>Using the generated demonstrations, various skill learning algorithms achieve an accomplishment rate three times the original on novel tasks.
 - Score: 15.495784871963451
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   Robots are increasingly deployed across diverse domains to tackle tasks requiring novel skills. However, current robot learning algorithms for acquiring novel skills often rely on demonstration datasets or environment interactions, resulting in high labor costs and potential safety risks. To address these challenges, this study proposes DemoGen, a skill-learning framework that enables robots to acquire novel skills from natural language instructions. DemoGen leverages the vision-language model and the video diffusion model to generate demonstration videos of novel skills, which enabling robots to learn new skills effectively. Experimental evaluations in the MetaWorld simulation environments demonstrate the pipeline's capability to generate high-fidelity and reliable demonstrations. Using the generated demonstrations, various skill learning algorithms achieve an accomplishment rate three times the original on novel tasks. These results highlight a novel approach to robot learning, offering a foundation for the intuitive and intelligent acquisition of novel robotic skills. (Project website: https://aoqunjin.github.io/LNSLGD/) 
 
       
      
        Related papers
        - Neuro-Symbolic Imitation Learning: Discovering Symbolic Abstractions for   Skill Learning [15.26375359103084]
This paper proposes a neuro-symbolic imitation learning framework.
It learns a symbolic representation that abstracts the low-level state-action space.
The learned representation decomposes a task into easier subtasks and allows the system to leverage symbolic planning.
arXiv  Detail & Related papers  (2025-03-27T11:50:29Z) - $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv  Detail & Related papers  (2024-10-31T17:22:30Z) - Continual Skill and Task Learning via Dialogue [3.3511259017219297]
Continual and interactive robot learning is a challenging problem as the robot is present with human users.
We present a framework for robots to query and learn visuo-motor robot skills and task relevant information via natural language dialog interactions with human users.
arXiv  Detail & Related papers  (2024-09-05T01:51:54Z) - VITAL: Visual Teleoperation to Enhance Robot Learning through   Human-in-the-Loop Corrections [10.49712834719005]
We propose a low-cost visual teleoperation system for bimanual manipulation tasks, called VITAL.
Our approach leverages affordable hardware and visual processing techniques to collect demonstrations.
We enhance the generalizability and robustness of the learned policies by utilizing both real and simulated environments.
arXiv  Detail & Related papers  (2024-07-30T23:29:47Z) - DiffGen: Robot Demonstration Generation via Differentiable Physics   Simulation, Differentiable Rendering, and Vision-Language Model [72.66465487508556]
DiffGen is a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model.
It can generate realistic robot demonstrations by minimizing the distance between the embedding of the language instruction and the embedding of the simulated observation.
Experiments demonstrate that with DiffGen, we could efficiently and effectively generate robot data with minimal human effort or training time.
arXiv  Detail & Related papers  (2024-05-12T15:38:17Z) - LOTUS: Continual Imitation Learning for Robot Manipulation Through   Unsupervised Skill Discovery [29.774700960178624]
We introduce LOTUS, a continual imitation learning algorithm that empowers a physical robot to continuously and efficiently learn to solve new manipulation tasks.
Continual skill discovery updates existing skills to avoid forgetting previous tasks and adds new skills to solve novel tasks.
Our comprehensive experiments show that LOTUS outperforms state-of-the-art baselines by over 11% in success rate.
arXiv  Detail & Related papers  (2023-11-03T17:38:35Z) - RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning   via Generative Simulation [68.70755196744533]
RoboGen is a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation.
Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics.
arXiv  Detail & Related papers  (2023-11-02T17:59:21Z) - XSkill: Cross Embodiment Skill Discovery [41.624343257852146]
XSkill is an imitation learning framework that discovers a cross-embodiment representation called skill prototypes purely from unlabeled human and robot manipulation videos.
Our experiments in simulation and real-world environments show that the discovered skill prototypes facilitate skill transfer and composition for unseen tasks.
arXiv  Detail & Related papers  (2023-07-19T12:51:28Z) - RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in
  One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.
Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations.
This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv  Detail & Related papers  (2023-07-02T15:33:31Z) - Surfer: Progressive Reasoning with World Models for Robotic Manipulation [51.26109827779267]
We introduce a novel and simple robot manipulation framework, called Surfer.
 Surfer treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene.
It is based on the world model, treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene.
arXiv  Detail & Related papers  (2023-06-20T07:06:04Z) - What Matters in Language Conditioned Robotic Imitation Learning [26.92329260907805]
We study the most critical challenges in learning language conditioned policies from offline free-form imitation datasets.
We present a novel approach that significantly outperforms the state of the art on the challenging language conditioned long-horizon robot manipulation CALVIN benchmark.
arXiv  Detail & Related papers  (2022-04-13T08:45:32Z) - Summarizing a virtual robot's past actions in natural language [0.3553493344868413]
We show how a popular dataset that matches robot actions with natural language descriptions designed for an instruction following task can be repurposed to serve as a training ground for robot action summarization work.
We propose and test several methods of learning to generate such summaries, starting from either egocentric video frames of the robot taking actions or intermediate text representations of the actions used by an automatic planner.
arXiv  Detail & Related papers  (2022-03-13T15:00:46Z) - Continual Learning from Demonstration of Robotics Skills [5.573543601558405]
Methods for teaching motion skills to robots focus on training for a single skill at a time.
We propose an approach for continual learning from demonstration using hypernetworks and neural ordinary differential equation solvers.
arXiv  Detail & Related papers  (2022-02-14T16:26:52Z) - BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning [108.41464483878683]
We study the problem of enabling a vision-based robotic manipulation system to generalize to novel tasks.
We develop an interactive and flexible imitation learning system that can learn from both demonstrations and interventions.
When scaling data collection on a real robot to more than 100 distinct tasks, we find that this system can perform 24 unseen manipulation tasks with an average success rate of 44%.
arXiv  Detail & Related papers  (2022-02-04T07:30:48Z) - Bottom-Up Skill Discovery from Unsegmented Demonstrations for
  Long-Horizon Robot Manipulation [55.31301153979621]
We tackle real-world long-horizon robot manipulation tasks through skill discovery.
We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations.
Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks.
arXiv  Detail & Related papers  (2021-09-28T16:18:54Z) - CRIL: Continual Robot Imitation Learning via Generative and Prediction
  Model [8.896427780114703]
We study how to realize continual imitation learning ability that empowers robots to continually learn new tasks one by one.
We propose a novel trajectory generation model that employs both a generative adversarial network and a dynamics prediction model.
Our experiments on both simulation and real world manipulation tasks demonstrate the effectiveness of our method.
arXiv  Detail & Related papers  (2021-06-17T12:15:57Z) - What Can I Do Here? Learning New Skills by Imagining Visual Affordances [128.65223577406587]
We show how generative models of possible outcomes can allow a robot to learn visual representations of affordances.
In effect, prior data is used to learn what kinds of outcomes may be possible, such that when the robot encounters an unfamiliar setting, it can sample potential outcomes from its model.
We show that visuomotor affordance learning (VAL) can be used to train goal-conditioned policies that operate on raw image inputs.
arXiv  Detail & Related papers  (2021-06-01T17:58:02Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.