Related papers: RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)

RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)

URL: http://arxiv.org/abs/2409.02920v3
Date: Wed, 16 Apr 2025 17:31:39 GMT
Title: RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)
Authors: Yao Mu, Tianxing Chen, Shijia Peng, Zanxin Chen, Zeyu Gao, Yude Zou, Lunkai Lin, Zhiqiang Xie, Ping Luo,
Abstract summary: RoboTwin is a generative digital twin framework that uses 3D generative foundation models and large language models to produce diverse expert datasets.<n>Specifically, RoboTwin creates varied digital twins of objects from single 2D images, generating realistic and interactive scenarios.<n>Our framework offers a comprehensive benchmark with both simulated and real-world data, enabling standardized evaluation and better alignment between simulated training and real-world performance.
Score: 25.298789781487084
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the rapidly advancing field of robotics, dual-arm coordination and complex object manipulation are essential capabilities for developing advanced autonomous systems. However, the scarcity of diverse, high-quality demonstration data and real-world-aligned evaluation benchmarks severely limits such development. To address this, we introduce RoboTwin, a generative digital twin framework that uses 3D generative foundation models and large language models to produce diverse expert datasets and provide a real-world-aligned evaluation platform for dual-arm robotic tasks. Specifically, RoboTwin creates varied digital twins of objects from single 2D images, generating realistic and interactive scenarios. It also introduces a spatial relation-aware code generation framework that combines object annotations with large language models to break down tasks, determine spatial constraints, and generate precise robotic movement code. Our framework offers a comprehensive benchmark with both simulated and real-world data, enabling standardized evaluation and better alignment between simulated training and real-world performance. We validated our approach using the open-source COBOT Magic Robot platform. Policies pre-trained on RoboTwin-generated data and fine-tuned with limited real-world samples improve the success rate of over 70% for single-arm tasks and over 40% for dual-arm tasks compared to models trained solely on real-world data. This significant improvement demonstrates RoboTwin's potential to enhance the development and evaluation of dual-arm robotic manipulation systems. Project Page: https://robotwin-benchmark.github.io/early-version/.

Related papers

XRoboToolkit: A Cross-Platform Framework for Robot Teleoperation [1.0522824606408765]
XRoboToolkit is a cross-platform framework for extended reality based robot teleoperation built on the OpenXR standard.<n>System features low-latency stereoscopic visual feedback, optimization-based inverse kinematics, and support for diverse tracking modalities.<n>We demonstrate the framework's effectiveness through precision manipulation tasks and validate data quality by training VLA models that exhibit robust autonomous performance.
arXiv Detail & Related papers (2025-07-31T18:45:13Z)
RoboPearls: Editable Video Simulation for Robot Manipulation [81.18434338506621]
RoboPearls is an editable video simulation framework for robotic manipulation.<n>Built on 3D Gaussian Splatting (3DGS), RoboPearls enables the construction of photo-realistic, view-consistent simulations.<n>We conduct extensive experiments on multiple datasets and scenes, including RLBench, COLOSSEUM, Ego4D, Open X-Embodiment, and a real-world robot.
arXiv Detail & Related papers (2025-06-28T05:03:31Z)
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation [51.86515213749527]
We present RoboTwin 2.0, a scalable simulation framework that enables automated, large-scale generation of diverse and realistic data.<n>To improve sim-to-real transfer, RoboTwin 2.0 incorporates structured domain randomization along five axes: clutter, lighting, background, tabletop height and language instructions.<n>We instantiate this framework across 50 dual-arm tasks spanning five robot embodiments, and pre-collect over 100,000 domain-randomized expert trajectories.
arXiv Detail & Related papers (2025-06-22T16:26:53Z)
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins [33.78621017138685]
RoboTwin is a generative digital twin framework that uses 3D generative foundation models and large language models to produce diverse expert datasets. Specifically, RoboTwin creates varied digital twins of objects from single 2D images, generating realistic and interactive scenarios. Our framework offers a comprehensive benchmark with both simulated and real-world data, enabling standardized evaluation and better alignment between simulated training and real-world performance.
arXiv Detail & Related papers (2025-04-17T16:14:24Z)
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots [133.23509142762356]
General-purpose robots need a versatile body and an intelligent mind. Recent advancements in humanoid robots have shown great promise as a hardware platform for building generalist autonomy. We introduce GR00T N1, an open foundation model for humanoid robots.
arXiv Detail & Related papers (2025-03-18T21:06:21Z)
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation [47.41571121843972]
We introduce RoboMIND, a dataset containing 107k demonstration trajectories across 479 diverse tasks involving 96 object classes. RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information. Our dataset also includes 5k real-world failure demonstrations, each accompanied by detailed causes, enabling failure reflection and correction.
arXiv Detail & Related papers (2024-12-18T14:17:16Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation [23.554917579133576]
We present Robotics Diffusion Transformer (RDT), a pioneering diffusion foundation model for bimanual manipulation. RDT builds on diffusion models to effectively represent multi-modality, with innovative designs of a scalable Transformer. We further introduce a Physically Interpretable Unified Action Space, which can unify the action representations of various robots.
arXiv Detail & Related papers (2024-10-10T12:33:46Z)
Semantically Controllable Augmentations for Generalizable Robot Learning [40.89398799604755]
Generalization to unseen real-world scenarios for robot manipulation requires exposure to diverse datasets during training. We propose a generative augmentation framework for semantically controllable augmentations and rapidly multiplying robot datasets.
arXiv Detail & Related papers (2024-09-02T05:25:34Z)
Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation [49.03165169369552]
By training a single policy across many different kinds of robots, a robot learning method can leverage much broader and more diverse datasets. We propose CrossFormer, a scalable and flexible transformer-based policy that can consume data from any embodiment. We demonstrate that the same network weights can control vastly different robots, including single and dual arm manipulation systems, wheeled robots, quadcopters, and quadrupeds.
arXiv Detail & Related papers (2024-08-21T17:57:51Z)
IRASim: A Fine-Grained World Model for Robot Manipulation [24.591694756757278]
We present IRASim, a novel world model capable of generating videos with fine-grained robot-object interaction details.<n>We train a diffusion transformer and introduce a novel frame-level action-conditioning module within each transformer block to explicitly model and strengthen the action-frame alignment.
arXiv Detail & Related papers (2024-06-20T17:50:16Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation. We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z)
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies. We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z)
Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks. One of the key contributing factors to this progress is the scale of robot data used to train the models. We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z)
RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z)
ExAug: Robot-Conditioned Navigation Policies via Geometric Experience Augmentation [73.63212031963843]
We propose a novel framework, ExAug, to augment the experiences of different robot platforms from multiple datasets in diverse environments. The trained policy is evaluated on two new robot platforms with three different cameras in indoor and outdoor environments with obstacles.
arXiv Detail & Related papers (2022-10-14T01:32:15Z)
PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training [25.50131893785007]
This work introduces a paradigm for pre-training a general purpose representation that can serve as a starting point for multiple tasks on a given robot. We present the Perception-Action Causal Transformer (PACT), a generative transformer-based architecture that aims to build representations directly from robot data in a self-supervised fashion. We show that finetuning small task-specific networks on top of the larger pretrained model results in significantly better performance compared to training a single model from scratch for all tasks simultaneously.
arXiv Detail & Related papers (2022-09-22T16:20:17Z)
MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis. The proposed dataset contains 100,000 images and 25 different object types. We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z)
V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated Objects [51.79035249464852]
We present a framework for learning multi-arm manipulation of articulated objects. Our framework includes a variational generative model that learns contact point distribution over object rigid parts for each robot arm.
arXiv Detail & Related papers (2021-11-07T02:31:09Z)
Peer-Assisted Robotic Learning: A Data-Driven Collaborative Learning Approach for Cloud Robotic Systems [26.01178673629753]
Peer-Assisted Robotic Learning (PARL) in robotics is inspired by the peer-assisted learning in cognitive psychology and pedagogy. Data and models are shared by robots to the cloud after semantic computing and training locally. Finally, fine tune this larger shared dataset in the cloud to local robots.
arXiv Detail & Related papers (2020-10-16T10:52:54Z)
robo-gym -- An Open Source Toolkit for Distributed Deep Reinforcement Learning on Real and Simulated Robots [0.5161531917413708]
We propose an open source toolkit: robo-gym to increase the use of Deep Reinforcement Learning with real robots. We demonstrate a unified setup for simulation and real environments which enables a seamless transfer from training in simulation to application on the robot. We showcase the capabilities and the effectiveness of the framework with two real world applications featuring industrial robots.
arXiv Detail & Related papers (2020-07-06T13:51:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.