Related papers: MuTT: A Multimodal Trajectory Transformer for Robot Skills

MuTT: A Multimodal Trajectory Transformer for Robot Skills

URL: http://arxiv.org/abs/2407.15660v2
Date: Thu, 22 Aug 2024 09:12:19 GMT
Title: MuTT: A Multimodal Trajectory Transformer for Robot Skills
Authors: Claudius Kienle, Benjamin Alt, Onur Celik, Philipp Becker, Darko Katic, Rainer Jäkel, Gerhard Neumann,
Abstract summary: MuTT is a novel encoder-decoder transformer architecture designed to predict environment-aware executions of robot skills. We pioneer the fusion of vision and trajectory, introducing a novel trajectory projection. This approach facilitates the optimization of robot skill parameters for the current environment, without the need for real-world executions.
Score: 14.84252843639553
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: High-level robot skills represent an increasingly popular paradigm in robot programming. However, configuring the skills' parameters for a specific task remains a manual and time-consuming endeavor. Existing approaches for learning or optimizing these parameters often require numerous real-world executions or do not work in dynamic environments. To address these challenges, we propose MuTT, a novel encoder-decoder transformer architecture designed to predict environment-aware executions of robot skills by integrating vision, trajectory, and robot skill parameters. Notably, we pioneer the fusion of vision and trajectory, introducing a novel trajectory projection. Furthermore, we illustrate MuTT's efficacy as a predictor when combined with a model-based robot skill optimizer. This approach facilitates the optimization of robot skill parameters for the current environment, without the need for real-world executions during optimization. Designed for compatibility with any representation of robot skills, MuTT demonstrates its versatility across three comprehensive experiments, showcasing superior performance across two different skill representations.

Related papers

RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation [90.81956345363355]
RoBridge is a hierarchical intelligent architecture for general robotic manipulation.<n>It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM)<n>It unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution.
arXiv Detail & Related papers (2025-05-03T06:17:18Z)
Holistic Construction Automation with Modular Robots: From High-Level Task Specification to Execution [7.012962572096341]
In situ robotic automation in construction is challenging due to constantly changing environments, a shortage of robotic experts, and a lack of standardized frameworks bridging robotics and construction practices.<n>This work proposes a holistic framework for construction task specification, optimization of robot morphology, and mission execution using a mobile modular reconfigurable robot.
arXiv Detail & Related papers (2024-12-30T11:11:13Z)
Human-Humanoid Robots Cross-Embodiment Behavior-Skill Transfer Using Decomposed Adversarial Learning from Demonstration [9.42179962375058]
We propose a transferable framework that reduces the data bottleneck by using a unified digital human model as a common prototype. The model learns behavior primitives from human demonstrations through adversarial imitation, and complex robot structures are decomposed into functional components. Our framework is validated on five humanoid robots with diverse configurations.
arXiv Detail & Related papers (2024-12-19T18:41:45Z)
Multi-Task Interactive Robot Fleet Learning with Visual World Models [25.001148860168477]
Sirius-Fleet is a multi-task interactive robot fleet learning framework. It monitors robot performance during deployment and involves humans to correct the robot's actions when necessary. As the robot autonomy improves, anomaly predictors automatically adapt their prediction criteria.
arXiv Detail & Related papers (2024-10-30T04:49:39Z)
DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model [72.66465487508556]
DiffGen is a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model. It can generate realistic robot demonstrations by minimizing the distance between the embedding of the language instruction and the embedding of the simulated observation. Experiments demonstrate that with DiffGen, we could efficiently and effectively generate robot data with minimal human effort or training time.
arXiv Detail & Related papers (2024-05-12T15:38:17Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation. We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z)
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [68.70755196744533]
RoboGen is a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics.
arXiv Detail & Related papers (2023-11-02T17:59:21Z)
Using Knowledge Representation and Task Planning for Robot-agnostic Skills on the Example of Contact-Rich Wiping Tasks [44.99833362998488]
We show how a single robot skill that utilizes knowledge representation, task planning, and automatic selection of skill implementations can be executed in different contexts. We demonstrate how the skill-based control platform enables this with contact-rich wiping tasks on different robot systems.
arXiv Detail & Related papers (2023-08-27T21:17:32Z)
Human-Robot Skill Transfer with Enhanced Compliance via Dynamic Movement Primitives [1.7901837062462316]
We introduce a systematic method to extract the dynamic features from human demonstration to auto-tune the parameters in the Dynamic Movement Primitives framework. Our method was implemented into an actual human-robot setup to extract human dynamic features and used to regenerate the robot trajectories following both LfD and RL.
arXiv Detail & Related papers (2023-04-12T08:48:28Z)
GLSO: Grammar-guided Latent Space Optimization for Sample-efficient Robot Design Automation [16.96128900256427]
We present Grammar-guided Latent Space Optimization (GLSO), a framework that transforms design automation into a low-dimensional continuous optimization problem. In this work, we present a framework that transforms design automation into a low-dimensional continuous optimization problem by training a graph variational autoencoder (VAE) to learn a mapping between the graph-structured design space and a continuous latent space.
arXiv Detail & Related papers (2022-09-23T17:48:24Z)
A Capability and Skill Model for Heterogeneous Autonomous Robots [69.50862982117127]
capability modeling is considered a promising approach to semantically model functions provided by different machines. This contribution investigates how to apply and extend capability models from manufacturing to the field of autonomous robots.
arXiv Detail & Related papers (2022-09-22T10:13:55Z)
REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer [57.045140028275036]
We consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology. Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots. We propose a novel method named $REvolveR$ of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator.
arXiv Detail & Related papers (2022-02-10T18:50:25Z)
Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models [29.34375999491465]
A core challenge for an autonomous agent acting in the real world is to adapt its repertoire of skills to cope with its noisy perception and dynamics. To scale learning of skills to long-horizon tasks, robots should be able to learn and later refine their skills in a structured manner. We proposeSAC-GMM, a novel hybrid approach that learns robot skills through a dynamical system and adapts the learned skills in their own trajectory distribution space.
arXiv Detail & Related papers (2021-11-25T15:36:11Z)
Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform. We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms. We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.