Robot Control Stack: A Lean Ecosystem for Robot Learning at Scale
- URL: http://arxiv.org/abs/2509.14932v1
- Date: Thu, 18 Sep 2025 13:12:16 GMT
- Title: Robot Control Stack: A Lean Ecosystem for Robot Learning at Scale
- Authors: Tobias Jülg, Pierre Krack, Seongjin Bien, Yannik Blei, Khaled Gamal, Ken Nakahara, Johannes Hechtl, Roberto Calandra, Wolfram Burgard, Florian Walter,
- Abstract summary: Vision-Language-Action models (VLAs) replace specialized architectures and task-tailored components of expert policies with large-scale data collection and setup-specific fine-tuning.<n>Traditional robotics software frameworks become a bottleneck, while robot simulations offer only limited support for transitioning from and to real-world experiments.<n>We introduce Robot Control Stack (RCS), a lean ecosystem designed from the ground up to support research in robot learning with large-scale generalist policies.
- Score: 11.166320712764465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-Language-Action models (VLAs) mark a major shift in robot learning. They replace specialized architectures and task-tailored components of expert policies with large-scale data collection and setup-specific fine-tuning. In this machine learning-focused workflow that is centered around models and scalable training, traditional robotics software frameworks become a bottleneck, while robot simulations offer only limited support for transitioning from and to real-world experiments. In this work, we close this gap by introducing Robot Control Stack (RCS), a lean ecosystem designed from the ground up to support research in robot learning with large-scale generalist policies. At its core, RCS features a modular and easily extensible layered architecture with a unified interface for simulated and physical robots, facilitating sim-to-real transfer. Despite its minimal footprint and dependencies, it offers a complete feature set, enabling both real-world experiments and large-scale training in simulation. Our contribution is twofold: First, we introduce the architecture of RCS and explain its design principles. Second, we evaluate its usability and performance along the development cycle of VLA and RL policies. Our experiments also provide an extensive evaluation of Octo, OpenVLA, and Pi Zero on multiple robots and shed light on how simulation data can improve real-world policy performance. Our code, datasets, weights, and videos are available at: https://robotcontrolstack.github.io/
Related papers
- GR00T N1: An Open Foundation Model for Generalist Humanoid Robots [133.23509142762356]
General-purpose robots need a versatile body and an intelligent mind.<n>Recent advancements in humanoid robots have shown great promise as a hardware platform for building generalist autonomy.<n>We introduce GR00T N1, an open foundation model for humanoid robots.
arXiv Detail & Related papers (2025-03-18T21:06:21Z) - $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - Generalized Robot Learning Framework [10.03174544844559]
We present a low-cost robot learning framework that is both easily reproducible and transferable to various robots and environments.
We demonstrate that deployable imitation learning can be successfully applied even to industrial-grade robots.
arXiv Detail & Related papers (2024-09-18T15:34:31Z) - Octo: An Open-Source Generalist Robot Policy [88.14295917143188]
We introduce Octo, a large transformer-based policy trained on 800k trajectories from the Open X-Embodiment dataset.
It can be effectively finetuned to robot setups with new sensory inputs and action spaces within a few hours on standard consumer GPU.
We also perform detailed ablations of design decisions for the Octo model, from architecture to training data, to guide future research on building generalist robot models.
arXiv Detail & Related papers (2024-05-20T17:57:01Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z) - SAGCI-System: Towards Sample-Efficient, Generalizable, Compositional,
and Incremental Robot Learning [41.19148076789516]
We introduce a systematic learning framework called SAGCI-system towards achieving the above four requirements.
Our system first takes the raw point clouds gathered by the camera mounted on the robot's wrist as the inputs and produces initial modeling of the surrounding environment represented as a URDF.
The robot then utilizes the interactive perception to interact with the environments to online verify and modify the URDF.
arXiv Detail & Related papers (2021-11-29T16:53:49Z) - Deep Imitation Learning for Bimanual Robotic Manipulation [70.56142804957187]
We present a deep imitation learning framework for robotic bimanual manipulation.
A core challenge is to generalize the manipulation skills to objects in different locations.
We propose to (i) decompose the multi-modal dynamics into elemental movement primitives, (ii) parameterize each primitive using a recurrent graph neural network to capture interactions, and (iii) integrate a high-level planner that composes primitives sequentially and a low-level controller to combine primitive dynamics and inverse kinematics control.
arXiv Detail & Related papers (2020-10-11T01:40:03Z) - robo-gym -- An Open Source Toolkit for Distributed Deep Reinforcement
Learning on Real and Simulated Robots [0.5161531917413708]
We propose an open source toolkit: robo-gym to increase the use of Deep Reinforcement Learning with real robots.
We demonstrate a unified setup for simulation and real environments which enables a seamless transfer from training in simulation to application on the robot.
We showcase the capabilities and the effectiveness of the framework with two real world applications featuring industrial robots.
arXiv Detail & Related papers (2020-07-06T13:51:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.