XRoboToolkit: A Cross-Platform Framework for Robot Teleoperation
- URL: http://arxiv.org/abs/2508.00097v1
- Date: Thu, 31 Jul 2025 18:45:13 GMT
- Title: XRoboToolkit: A Cross-Platform Framework for Robot Teleoperation
- Authors: Zhigen Zhao, Liuchuan Yu, Ke Jing, Ning Yang,
- Abstract summary: XRoboToolkit is a cross-platform framework for extended reality based robot teleoperation built on the OpenXR standard.<n>System features low-latency stereoscopic visual feedback, optimization-based inverse kinematics, and support for diverse tracking modalities.<n>We demonstrate the framework's effectiveness through precision manipulation tasks and validate data quality by training VLA models that exhibit robust autonomous performance.
- Score: 1.0522824606408765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of Vision-Language-Action models has created an urgent need for large-scale, high-quality robot demonstration datasets. Although teleoperation is the predominant method for data collection, current approaches suffer from limited scalability, complex setup procedures, and suboptimal data quality. This paper presents XRoboToolkit, a cross-platform framework for extended reality based robot teleoperation built on the OpenXR standard. The system features low-latency stereoscopic visual feedback, optimization-based inverse kinematics, and support for diverse tracking modalities including head, controller, hand, and auxiliary motion trackers. XRoboToolkit's modular architecture enables seamless integration across robotic platforms and simulation environments, spanning precision manipulators, mobile robots, and dexterous hands. We demonstrate the framework's effectiveness through precision manipulation tasks and validate data quality by training VLA models that exhibit robust autonomous performance.
Related papers
- Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons [69.87766750714945]
General-purpose robot reward models are typically trained to predict absolute task progress from expert demonstrations.<n>We introduce Robometer, a scalable reward modeling framework that combines intra-trajectory progress supervision with inter-trajectory preference supervision.<n>Robometer is trained with a dual objective: a frame-level progress loss that anchors reward magnitude on expert data, and a trajectory-comparison preference loss that imposes global ordering constraints.
arXiv Detail & Related papers (2026-03-02T17:38:58Z) - RobotSeg: A Model and Dataset for Segmenting Robots in Image and Video [56.9581053843815]
We introduce RobotSeg, a foundation model for robot segmentation in image and video.<n>It addresses the lack of adaptation to articulated robots, reliance on manual prompts, and the need for per-frame training mask annotations.<n>It achieves state-of-the-art performance on both images and videos.
arXiv Detail & Related papers (2025-11-28T07:51:02Z) - Pixel Motion Diffusion is What We Need for Robot Control [38.925028601732116]
DAWN is a unified diffusion-based framework for language-conditioned robotic manipulation.<n>It bridges high-level motion intent and low-level robot action via structured pixel motion representation.<n>DAWN achieves state-of-the-art results on the challenging CALVIN benchmark.
arXiv Detail & Related papers (2025-09-26T17:59:59Z) - cVLA: Towards Efficient Camera-Space VLAs [26.781510474119845]
Vision-Language-Action (VLA) models offer a compelling framework for tackling complex robotic manipulation tasks.<n>We propose a novel VLA approach that leverages the competitive performance of Vision Language Models on 2D images.<n>Our model predicts trajectory waypoints, making it both more efficient to train and robot embodiment.
arXiv Detail & Related papers (2025-07-02T22:56:41Z) - RoboPearls: Editable Video Simulation for Robot Manipulation [81.18434338506621]
RoboPearls is an editable video simulation framework for robotic manipulation.<n>Built on 3D Gaussian Splatting (3DGS), RoboPearls enables the construction of photo-realistic, view-consistent simulations.<n>We conduct extensive experiments on multiple datasets and scenes, including RLBench, COLOSSEUM, Ego4D, Open X-Embodiment, and a real-world robot.
arXiv Detail & Related papers (2025-06-28T05:03:31Z) - ORV: 4D Occupancy-centric Robot Video Generation [33.360345403049685]
Acquiring real-world robotic simulation data through teleoperation is notoriously time-consuming and labor-intensive.<n>We propose ORV, an Occupancy-centric Robot Video generation framework, which utilizes 4D semantic occupancy sequences as a fine-grained representation.<n>By leveraging occupancy-based representations, ORV enables seamless translation of simulation data into photorealistic robot videos, while ensuring high temporal consistency and precise controllability.
arXiv Detail & Related papers (2025-06-03T17:00:32Z) - RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins [33.78621017138685]
RoboTwin is a generative digital twin framework that uses 3D generative foundation models and large language models to produce diverse expert datasets.<n>Specifically, RoboTwin creates varied digital twins of objects from single 2D images, generating realistic and interactive scenarios.<n>Our framework offers a comprehensive benchmark with both simulated and real-world data, enabling standardized evaluation and better alignment between simulated training and real-world performance.
arXiv Detail & Related papers (2025-04-17T16:14:24Z) - Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation [50.34179054785646]
We present Taccel, a high-performance simulation platform that integrates IPC and ABD to model robots, tactile sensors, and objects with both accuracy and unprecedented speed.<n>Taccel provides precise physics simulation and realistic tactile signals while supporting flexible robot-sensor configurations through user-friendly APIs.<n>These capabilities position Taccel as a powerful tool for scaling up tactile robotics research and development.
arXiv Detail & Related papers (2025-04-17T12:57:11Z) - Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [50.191655141020505]
This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer.<n>By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
arXiv Detail & Related papers (2025-01-17T10:39:09Z) - SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation [82.61572106180705]
This paper presents a unified approach using vision-language models (VLMs) to improve keypoint prediction across various garment categories.
We created a large-scale synthetic dataset using advanced simulation techniques, allowing scalable training without extensive real-world data.
Experimental results indicate that the VLM-based method significantly enhances keypoint detection accuracy and task success rates.
arXiv Detail & Related papers (2024-09-26T17:26:16Z) - RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version) [25.298789781487084]
RoboTwin is a generative digital twin framework that uses 3D generative foundation models and large language models to produce diverse expert datasets.<n>Specifically, RoboTwin creates varied digital twins of objects from single 2D images, generating realistic and interactive scenarios.<n>Our framework offers a comprehensive benchmark with both simulated and real-world data, enabling standardized evaluation and better alignment between simulated training and real-world performance.
arXiv Detail & Related papers (2024-09-04T17:59:52Z) - LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [56.505551117094534]
We introduce LLaRA: Large Language and Robotics Assistant, a framework that formulates robot action policy as visuo-textual conversations.<n>First, we present an automated pipeline to generate conversation-style instruction tuning data for robots from existing behavior cloning datasets.<n>We show that a VLM finetuned with a limited amount of such datasets can produce meaningful action decisions for robotic control.
arXiv Detail & Related papers (2024-06-28T17:59:12Z) - IRASim: A Fine-Grained World Model for Robot Manipulation [24.591694756757278]
We present IRASim, a novel world model capable of generating videos with fine-grained robot-object interaction details.<n>We train a diffusion transformer and introduce a novel frame-level action-conditioning module within each transformer block to explicitly model and strengthen the action-frame alignment.
arXiv Detail & Related papers (2024-06-20T17:50:16Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.