Related papers: VertiFormer: A Data-Efficient Multi-Task Transformer for Off-Road Robot Mobility

VertiFormer: A Data-Efficient Multi-Task Transformer for Off-Road Robot Mobility

URL: http://arxiv.org/abs/2502.00543v1
Date: Sat, 01 Feb 2025 20:21:00 GMT
Title: VertiFormer: A Data-Efficient Multi-Task Transformer for Off-Road Robot Mobility
Authors: Mohammad Nazeri, Anuj Pokhrel, Alexandyr Card, Aniket Datar, Garrett Warnell, Xuesu Xiao,
Abstract summary: VertiFormer is a novel data-efficient multi-task Transformer model trained with only one hour of data.<n>Our experiments offer insights into effectively utilizing Transformers for off-road robot mobility with limited data.
Score: 49.512339092493384
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sophisticated learning architectures, e.g., Transformers, present a unique opportunity for robots to understand complex vehicle-terrain kinodynamic interactions for off-road mobility. While internet-scale data are available for Natural Language Processing (NLP) and Computer Vision (CV) tasks to train Transformers, real-world mobility data are difficult to acquire with physical robots navigating off-road terrain. Furthermore, training techniques specifically designed to process text and image data in NLP and CV may not apply to robot mobility. In this paper, we propose VertiFormer, a novel data-efficient multi-task Transformer model trained with only one hour of data to address such challenges of applying Transformer architectures for robot mobility on extremely rugged, vertically challenging, off-road terrain. Specifically, VertiFormer employs a new learnable masked modeling and next token prediction paradigm to predict the next pose, action, and terrain patch to enable a variety of off-road mobility tasks simultaneously, e.g., forward and inverse kinodynamics modeling. The non-autoregressive design mitigates computational bottlenecks and error propagation associated with autoregressive models. VertiFormer's unified modality representation also enhances learning of diverse temporal mappings and state representations, which, combined with multiple objective functions, further improves model generalization. Our experiments offer insights into effectively utilizing Transformers for off-road robot mobility with limited data and demonstrate our efficiently trained Transformer can facilitate multiple off-road mobility tasks onboard a physical mobile robot.

Related papers

Human-Humanoid Robots Cross-Embodiment Behavior-Skill Transfer Using Decomposed Adversarial Learning from Demonstration [9.42179962375058]
We propose a transferable framework that reduces the data bottleneck by using a unified digital human model as a common prototype.<n>The model learns behavior primitives from human demonstrations through adversarial imitation, and complex robot structures are decomposed into functional components.<n>Our framework is validated on five humanoid robots with diverse configurations.
arXiv Detail & Related papers (2024-12-19T18:41:45Z)
SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation [82.61572106180705]
This paper presents a unified approach using vision-language models (VLMs) to improve keypoint prediction across various garment categories. We created a large-scale synthetic dataset using advanced simulation techniques, allowing scalable training without extensive real-world data. Experimental results indicate that the VLM-based method significantly enhances keypoint detection accuracy and task success rates.
arXiv Detail & Related papers (2024-09-26T17:26:16Z)
Guided Decoding for Robot On-line Motion Generation and Adaption [44.959409835754634]
We present a novel motion generation approach for robot arms, with high degrees of freedom, in complex settings that can adapt online to obstacles or new via points. We train a transformer architecture, based on conditional variational autoencoder, on a large dataset of simulated trajectories used as demonstrations. We show that our model successfully generates motion from different initial and target points and that is capable of generating trajectories that navigate complex tasks across different robotic platforms.
arXiv Detail & Related papers (2024-03-22T14:32:27Z)
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies. We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z)
RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z)
PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training [25.50131893785007]
This work introduces a paradigm for pre-training a general purpose representation that can serve as a starting point for multiple tasks on a given robot. We present the Perception-Action Causal Transformer (PACT), a generative transformer-based architecture that aims to build representations directly from robot data in a self-supervised fashion. We show that finetuning small task-specific networks on top of the larger pretrained model results in significantly better performance compared to training a single model from scratch for all tasks simultaneously.
arXiv Detail & Related papers (2022-09-22T16:20:17Z)
Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform. We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms. We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.