Related papers: Multi-task real-robot data with gaze attention for dual-arm fine manipulation

Multi-task real-robot data with gaze attention for dual-arm fine manipulation

URL: http://arxiv.org/abs/2401.07603v3
Date: Tue, 19 Mar 2024 11:17:00 GMT
Title: Multi-task real-robot data with gaze attention for dual-arm fine manipulation
Authors: Heecheol Kim, Yoshiyuki Ohmura, Yasuo Kuniyoshi,
Abstract summary: This paper introduces a dataset of diverse object manipulations that includes dual-arm tasks and/or tasks requiring fine manipulation. We have generated dataset with 224k episodes (150 hours, 1,104 language instructions) which includes dual-arm fine tasks such as bowl-moving, pencil-case opening or banana-peeling. This dataset includes visual attention signals as well as dual-action labels, a signal that separates actions into a robust reaching trajectory and precise interaction with objects, and language instructions to achieve robust and precise object manipulation.
Score: 4.717749411286867
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In the field of robotic manipulation, deep imitation learning is recognized as a promising approach for acquiring manipulation skills. Additionally, learning from diverse robot datasets is considered a viable method to achieve versatility and adaptability. In such research, by learning various tasks, robots achieved generality across multiple objects. However, such multi-task robot datasets have mainly focused on single-arm tasks that are relatively imprecise, not addressing the fine-grained object manipulation that robots are expected to perform in the real world. This paper introduces a dataset of diverse object manipulations that includes dual-arm tasks and/or tasks requiring fine manipulation. To this end, we have generated dataset with 224k episodes (150 hours, 1,104 language instructions) which includes dual-arm fine tasks such as bowl-moving, pencil-case opening or banana-peeling, and this data is publicly available. Additionally, this dataset includes visual attention signals as well as dual-action labels, a signal that separates actions into a robust reaching trajectory and precise interaction with objects, and language instructions to achieve robust and precise object manipulation. We applied the dataset to our Dual-Action and Attention (DAA), a model designed for fine-grained dual arm manipulation tasks and robust against covariate shifts. The model was tested with over 7k total trials in real robot manipulation tasks, demonstrating its capability in fine manipulation.

Related papers

MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos [43.836197294180316]
We present MAPLE, a novel method for dexterous robotic manipulation that exploits rich manipulation priors to enable efficient policy learning. Specifically, we predict hand-object contact points and detailed hand poses at the moment of hand-object contact and use the learned features to train policies for downstream manipulation tasks.
arXiv Detail & Related papers (2025-04-08T14:25:25Z)
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation [47.41571121843972]
We introduce RoboMIND, a dataset containing 107k demonstration trajectories across 479 diverse tasks involving 96 object classes. RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information. Our dataset also includes 5k real-world failure demonstrations, each accompanied by detailed causes, enabling failure reflection and correction.
arXiv Detail & Related papers (2024-12-18T14:17:16Z)
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation [23.554917579133576]
We present Robotics Diffusion Transformer (RDT), a pioneering diffusion foundation model for bimanual manipulation. RDT builds on diffusion models to effectively represent multi-modality, with innovative designs of a scalable Transformer. We further introduce a Physically Interpretable Unified Action Space, which can unify the action representations of various robots.
arXiv Detail & Related papers (2024-10-10T12:33:46Z)
Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning [35.42091835421386]
Multimodal task specification is essential for enhanced robotic performance. We show that by leveraging unimodal instructions abundant in real data, we can effectively teach robots to learn multimodal task specifications.
arXiv Detail & Related papers (2024-10-02T13:23:02Z)
Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation [65.46610405509338]
We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation. Our framework,Track2Act predicts tracks of how points in an image should move in future time-steps based on a goal. We show that this approach of combining scalably learned track prediction with a residual policy enables diverse generalizable robot manipulation.
arXiv Detail & Related papers (2024-05-02T17:56:55Z)
RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking [54.776890150458385]
We develop an efficient system for training universal agents capable of multi-task manipulation skills. We are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks. On average, RoboAgent outperforms prior methods by over 40% in unseen situations.
arXiv Detail & Related papers (2023-09-05T03:14:39Z)
RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots. Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations. This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv Detail & Related papers (2023-07-02T15:33:31Z)
Goal-conditioned dual-action imitation learning for dexterous dual-arm robot manipulation [4.717749411286867]
Long-conditioned dexterous robot manipulation of deformable objects, such as banana peeling, is a problematic task.<n>This paper presents a goal-conditioned dual-action deep imitation learning (DIL) approach that can learn dexterous manipulation skills.
arXiv Detail & Related papers (2022-03-18T05:17:00Z)
Lifelong Robotic Reinforcement Learning by Retaining Experiences [61.79346922421323]
Many multi-task reinforcement learning efforts assume the robot can collect data from all tasks at all times. In this work, we study a practical sequential multi-task RL problem motivated by the practical constraints of physical robotic systems. We derive an approach that effectively leverages the data and policies learned for previous tasks to cumulatively grow the robot's skill-set.
arXiv Detail & Related papers (2021-09-19T18:00:51Z)
Transformer-based deep imitation learning for dual-arm robot manipulation [4.717749411286867]
In a dual-arm manipulation setup, the increased number of state dimensions caused by the additional robot manipulators causes distractions.<n>We address this issue using a self-attention mechanism that computes dependencies between elements in a sequential input and focuses on important elements.<n>A Transformer, a variant of self-attention architecture, is applied to deep imitation learning to solve dual-arm manipulation tasks in the real world.
arXiv Detail & Related papers (2021-08-01T07:42:39Z)
Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots. We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector. We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works. However, learning a model that captures the dynamics of complex skills represents a major challenge. We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.