CoinRobot: Generalized End-to-end Robotic Learning for Physical Intelligence
- URL: http://arxiv.org/abs/2503.05316v1
- Date: Fri, 07 Mar 2025 10:50:58 GMT
- Title: CoinRobot: Generalized End-to-end Robotic Learning for Physical Intelligence
- Authors: Yu Zhao, Huxian Liu, Xiang Chen, Jiankai Sun, Jiahuan Yan, Luhui Hu,
- Abstract summary: Our framework supports cross-platform adaptability, enabling seamless deployment across industrial-grade robots, collaborative arms, and novel embodiments without task-specific modifications.<n>We validate our framework through extensive experiments on seven manipulation tasks. Notably, Diffusion-based models trained in our framework demonstrated superior performance and generalizability compared to the LeRobot framework.
- Score: 12.629888401901418
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Physical intelligence holds immense promise for advancing embodied intelligence, enabling robots to acquire complex behaviors from demonstrations. However, achieving generalization and transfer across diverse robotic platforms and environments requires careful design of model architectures, training strategies, and data diversity. Meanwhile existing systems often struggle with scalability, adaptability to heterogeneous hardware, and objective evaluation in real-world settings. We present a generalized end-to-end robotic learning framework designed to bridge this gap. Our framework introduces a unified architecture that supports cross-platform adaptability, enabling seamless deployment across industrial-grade robots, collaborative arms, and novel embodiments without task-specific modifications. By integrating multi-task learning with streamlined network designs, it achieves more robust performance than conventional approaches, while maintaining compatibility with varying sensor configurations and action spaces. We validate our framework through extensive experiments on seven manipulation tasks. Notably, Diffusion-based models trained in our framework demonstrated superior performance and generalizability compared to the LeRobot framework, achieving performance improvements across diverse robotic platforms and environmental conditions.
Related papers
- Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [50.191655141020505]
We introduce a novel framework for learning world models.<n>By providing a scalable and robust framework, we pave the way for adaptive and efficient robotic systems in real-world applications.
arXiv Detail & Related papers (2025-01-17T10:39:09Z) - Capability-Aware Shared Hypernetworks for Flexible Heterogeneous Multi-Robot Coordination [2.681242476043447]
We propose Capability-Aware Shared Hypernetworks (CASH) to enable a single architecture to dynamically adapt to each robot and the current context.<n>CASH encodes shared decision making strategies that can be adapted to each robot based on local observations and the robots' individual and collective capabilities.
arXiv Detail & Related papers (2025-01-10T15:39:39Z) - Generalized Robot Learning Framework [10.03174544844559]
We present a low-cost robot learning framework that is both easily reproducible and transferable to various robots and environments.
We demonstrate that deployable imitation learning can be successfully applied even to industrial-grade robots.
arXiv Detail & Related papers (2024-09-18T15:34:31Z) - RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis [102.1876259853457]
We propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX.
RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints.
To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning.
arXiv Detail & Related papers (2024-02-25T15:31:43Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - Modular Customizable ROS-Based Framework for Rapid Development of Social
Robots [3.6622737533847936]
We present the Socially-interactive Robot Software platform (SROS), an open-source framework addressing this need through a modular layered architecture.
Specialized perceptual and interactive skills are implemented as ROS services for reusable deployment on any robot.
We experimentally validated core SROS technologies including computer vision, speech processing, and GPT2 autocomplete speech implemented as plug-and-play ROS services.
arXiv Detail & Related papers (2023-11-27T12:54:20Z) - Online Learning and Planning in Cognitive Hierarchies [10.28577981317938]
We extend an existing formal framework to model complex integrated reasoning behaviours of robotic systems.
New framework allows for a more flexible modelling of the interactions between different reasoning components.
arXiv Detail & Related papers (2023-10-18T23:53:51Z) - RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z) - Cognitive architecture aided by working-memory for self-supervised
multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions.
Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task.
One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z) - Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic
Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform.
We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms.
We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.