Related papers: Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation

Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation

URL: http://arxiv.org/abs/2503.05114v1
Date: Fri, 07 Mar 2025 03:19:25 GMT
Title: Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation
Authors: Tong Mu, Yihao Liu, Mehran Armand,
Abstract summary: We propose a framework that uses serialized Finite State Machine to generate demonstrations and improve the success rate in manipulation tasks requiring a long sequence of precise interactions.<n> Experimental results show that our approach achieves a success rate of up to 98 in these tasks, compared to the controlled condition using existing approaches.
Score: 6.649586181283724
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Imitation learning frameworks for robotic manipulation have drawn attention in the recent development of language model grounded robotics. However, the success of the frameworks largely depends on the coverage of the demonstration cases: When the demonstration set does not include examples of how to act in all possible situations, the action may fail and can result in cascading errors. To solve this problem, we propose a framework that uses serialized Finite State Machine (FSM) to generate demonstrations and improve the success rate in manipulation tasks requiring a long sequence of precise interactions. To validate its effectiveness, we use environmentally evolving and long-horizon puzzles that require long sequential actions. Experimental results show that our approach achieves a success rate of up to 98 in these tasks, compared to the controlled condition using existing approaches, which only had a success rate of up to 60, and, in some tasks, almost failed completely.

Related papers

A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees [1.3481665321936716]
This paper presents a unified failure recovery framework that combines Vision-Language Models (VLMs), a reactive planner, and Behavior Trees (BTs) to enable real-time failure handling. Our approach includes pre-execution verification, which checks for potential failures before execution, and reactive failure handling, which detects and corrects failures during execution. We evaluate our framework through real-world experiments with an ABB YuMi robot on tasks like peg insertion, object sorting, and drawer placement.
arXiv Detail & Related papers (2025-03-19T13:40:56Z)
Screw Geometry Meets Bandits: Incremental Acquisition of Demonstrations to Generate Manipulation Plans [9.600625243282618]
We study the problem of methodically obtaining a sufficient set of kinesthetic demonstrations, one at a time. We present a novel approach to address these open problems using (i) a screw geometric representation to generate manipulation plans from demonstrations. We present experimental results on two example manipulation tasks, namely, pouring and scooping, to illustrate our approach.
arXiv Detail & Related papers (2024-10-23T20:57:56Z)
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation [62.58480650443393]
Segment Anything (SAM) is a vision-foundation model for generalizable scene understanding and sequence imitation. We develop a novel multi-channel heatmap that enables the prediction of the action sequence in a single pass.
arXiv Detail & Related papers (2024-05-30T00:32:51Z)
Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation [30.54275273155153]
Multimodal Large Language Models (MLLMs) have shown promise in visual instruction following. We introduce a Self-Corrected (SC)-MLLM, equipping our model not only to predict end-effector poses but also to autonomously recognize and correct failure actions. SC-MLLM agent significantly improve manipulation accuracy compared to previous state-of-the-art robotic MLLM (ManipLLM)
arXiv Detail & Related papers (2024-05-27T17:58:48Z)
Unsupervised Learning of Effective Actions in Robotics [0.9374652839580183]
Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions. We propose an unsupervised algorithm to discretize a continuous motion space and generate "action prototypes" We evaluate our method on a simulated stair-climbing reinforcement learning task.
arXiv Detail & Related papers (2024-04-03T13:28:52Z)
Towards Bridging the Gap between High-Level Reasoning and Execution on Robots [2.6107298043931206]
When reasoning about actions, e.g., by means of task planning or agent programming with Golog, the robot's actions are typically modeled on an abstract level. However, when executing such an action on a robot it can no longer be seen as a primitive. In this thesis, we propose several approaches towards closing this gap.
arXiv Detail & Related papers (2023-12-30T12:26:12Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
One ACT Play: Single Demonstration Behavior Cloning with Action Chunking Transformers [11.875194596371484]
Humans can learn to complete tasks, even complex ones, after only seeing one or two demonstrations. Our work seeks to emulate this ability, using behavior cloning to learn a task given only a single human demonstration. We develop a novel addition to the temporal ensembling method used by action chunking agents during inference.
arXiv Detail & Related papers (2023-09-18T21:50:26Z)
Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration. We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z)
Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation [55.31301153979621]
We tackle real-world long-horizon robot manipulation tasks through skill discovery. We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations. Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks.
arXiv Detail & Related papers (2021-09-28T16:18:54Z)
Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query. Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories. We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z)
Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots. We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector. We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.