Related papers: Body Transformer: Leveraging Robot Embodiment for Policy Learning

Body Transformer: Leveraging Robot Embodiment for Policy Learning

URL: http://arxiv.org/abs/2408.06316v1
Date: Mon, 12 Aug 2024 17:31:28 GMT
Title: Body Transformer: Leveraging Robot Embodiment for Policy Learning
Authors: Carmelo Sferrazza, Dun-Ming Huang, Fangchen Liu, Jongmin Lee, Pieter Abbeel,
Abstract summary: Body Transformer (BoT) is an architecture that leverages the robot embodiment by providing an inductive bias that guides the learning process. We represent the robot body as a graph of sensors and actuators, and rely on masked attention to pool information throughout the architecture. The resulting architecture outperforms the vanilla transformer, as well as the classical multilayer perceptron, in terms of task completion, scaling properties, and computational efficiency.
Score: 51.531793239586165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, the transformer architecture has become the de facto standard for machine learning algorithms applied to natural language processing and computer vision. Despite notable evidence of successful deployment of this architecture in the context of robot learning, we claim that vanilla transformers do not fully exploit the structure of the robot learning problem. Therefore, we propose Body Transformer (BoT), an architecture that leverages the robot embodiment by providing an inductive bias that guides the learning process. We represent the robot body as a graph of sensors and actuators, and rely on masked attention to pool information throughout the architecture. The resulting architecture outperforms the vanilla transformer, as well as the classical multilayer perceptron, in terms of task completion, scaling properties, and computational efficiency when representing either imitation or reinforcement learning policies. Additional material including the open-source code is available at https://sferrazza.cc/bot_site.

Related papers

Modular Recurrence in Contextual MDPs for Universal Morphology Control [0.0]
Generalization to new, unseen robots, however, remains a challenge.<n>We implement a modular recurrent architecture and evaluate its generalization performance on a large set of MuJoCo robots.
arXiv Detail & Related papers (2025-06-10T09:44:30Z)
Rodrigues Network for Learning Robot Actions [76.69283501115855]
We propose the Neural Rodrigues Operator to inject kinematics-aware inductive bias into neural computation.<n>We design the Rodrigues Network (RodriNet), a novel neural architecture specialized for processing actions.<n>Our results suggest that integrating structured kinematic priors into the network architecture improves action learning in various domains.
arXiv Detail & Related papers (2025-06-03T08:34:06Z)
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation [90.81956345363355]
RoBridge is a hierarchical intelligent architecture for general robotic manipulation.<n>It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM)<n>It unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution.
arXiv Detail & Related papers (2025-05-03T06:17:18Z)
Advances in Transformers for Robotic Applications: A Review [0.9208007322096533]
We go through recent advances and trends in Transformers in Robotics. We examine their integration into robotic perception, planning, and control for autonomous systems. We discuss how different Transformer variants are being adapted in robotics for reliable planning and perception.
arXiv Detail & Related papers (2024-12-13T23:02:15Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
The Ingredients for Robotic Diffusion Transformers [47.61690903645525]
We identify, study and improve key architectural design decisions for high-capacity diffusion transformer policies. The resulting models can efficiently solve diverse tasks on multiple robot embodiments. We find that our policies show improved scaling performance when trained on 10 hours of highly multi-modal, language annotated ALOHA demonstration data.
arXiv Detail & Related papers (2024-10-14T02:02:54Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation. We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z)
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation [33.10577695383743]
We propose a multi-embodiment, multi-task generalist agent for robotic manipulation called RoboCat. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples.
arXiv Detail & Related papers (2023-06-20T17:35:20Z)
RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z)
Instruction-driven history-aware policies for robotic manipulations [82.25511767738224]
We propose a unified transformer-based approach that takes into account multiple inputs. In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations. We evaluate our method on the challenging RLBench benchmark and on a real-world robot.
arXiv Detail & Related papers (2022-09-11T16:28:25Z)
What Matters in Language Conditioned Robotic Imitation Learning [26.92329260907805]
We study the most critical challenges in learning language conditioned policies from offline free-form imitation datasets. We present a novel approach that significantly outperforms the state of the art on the challenging language conditioned long-horizon robot manipulation CALVIN benchmark.
arXiv Detail & Related papers (2022-04-13T08:45:32Z)
MetaMorph: Learning Universal Controllers with Transformers [45.478223199658785]
In robotics we primarily train a single robot for a single task. modular robot systems now allow for the flexible combination of general-purpose building blocks into task optimized morphologies. We propose MetaMorph, a Transformer based approach to learn a universal controller over a modular robot design space.
arXiv Detail & Related papers (2022-03-22T17:58:31Z)
Transformer-based deep imitation learning for dual-arm robot manipulation [5.3022775496405865]
In a dual-arm manipulation setup, the increased number of state dimensions caused by the additional robot manipulators causes distractions. We address this issue using a self-attention mechanism that computes dependencies between elements in a sequential input and focuses on important elements. A Transformer, a variant of self-attention architecture, is applied to deep imitation learning to solve dual-arm manipulation tasks in the real world.
arXiv Detail & Related papers (2021-08-01T07:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.