Related papers: Skill Transformer: A Monolithic Policy for Mobile Manipulation

Skill Transformer: A Monolithic Policy for Mobile Manipulation

URL: http://arxiv.org/abs/2308.09873v1
Date: Sat, 19 Aug 2023 01:37:41 GMT
Title: Skill Transformer: A Monolithic Policy for Mobile Manipulation
Authors: Xiaoyu Huang, Dhruv Batra, Akshara Rai, Andrew Szot
Abstract summary: We present Skill Transformer, an approach for solving long-horizon robotic tasks by combining conditional sequence modeling and skill modularity. Conditioned on egocentric and proprioceptive observations of a robot, Skill Transformer is trained end-to-end to predict both a high-level skill and a whole-body low-level action. We test Skill Transformer on an embodied benchmark and find it performs robust task planning and low-level control in new scenarios, achieving a 2.5x higher success rate than baselines in hard rearrangement problems.
Score: 36.18813073796717
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present Skill Transformer, an approach for solving long-horizon robotic tasks by combining conditional sequence modeling and skill modularity. Conditioned on egocentric and proprioceptive observations of a robot, Skill Transformer is trained end-to-end to predict both a high-level skill (e.g., navigation, picking, placing), and a whole-body low-level action (e.g., base and arm motion), using a transformer architecture and demonstration trajectories that solve the full task. It retains the composability and modularity of the overall task through a skill predictor module while reasoning about low-level actions and avoiding hand-off errors, common in modular approaches. We test Skill Transformer on an embodied rearrangement benchmark and find it performs robust task planning and low-level control in new scenarios, achieving a 2.5x higher success rate than baselines in hard rearrangement problems.

Related papers

Action Flow Matching for Continual Robot Learning [57.698553219660376]
Continual learning in robotics seeks systems that can constantly adapt to changing environments and tasks. We introduce a generative framework leveraging flow matching for online robot dynamics model alignment. We find that by transforming the actions themselves rather than exploring with a misaligned model, the robot collects informative data more efficiently.
arXiv Detail & Related papers (2025-04-25T16:26:15Z)
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills [31.788094786664324]
Building autonomous robotic agents capable of achieving human-level performance in real-world embodied tasks is an ultimate goal in humanoid robot research. Recent advances have made significant progress in high-level cognition with Foundation Models (FMs) and low-level skill development for humanoid robots. We introduce Being-0, a hierarchical agent framework that integrates an FM with a modular skill library. Being-0 achieves efficient, real-time performance on a full-sized humanoid robot equipped with dexterous hands and active vision.
arXiv Detail & Related papers (2025-03-16T14:53:53Z)
ModSkill: Physical Character Skill Modularization [21.33764810227885]
We introduce a novel skill learning framework, ModSkill, that decouples complex full-body skills into compositional, modular skills for independent body parts. Our results show that this modularized skill learning framework, enhanced by generative sampling, outperforms existing methods in precise full-body motion tracking.
arXiv Detail & Related papers (2025-02-19T22:55:49Z)
MeMo: Meaningful, Modular Controllers via Noise Injection [25.541496793132183]
We show that when a new robot is built from the same parts, its control can be quickly learned by reusing the modular controllers. We achieve this with a framework called MeMo which learns (Me)aningful, (Mo)dular controllers. We benchmark our framework in locomotion and grasping environments on simple to complex robot morphology transfer.
arXiv Detail & Related papers (2024-05-24T18:39:20Z)
Generalize by Touching: Tactile Ensemble Skill Transfer for Robotic Furniture Assembly [24.161856591498825]
Tactile Ensemble Skill Transfer (TEST) is a pioneering offline reinforcement learning (RL) approach that incorporates tactile feedback in the control loop. TEST's core design is to learn a skill transition model for high-level planning, along with a set of adaptive intra-skill goal-reaching policies. Results indicate that TEST can achieve a success rate of 90% and is over 4 times more efficient than the generalization policy.
arXiv Detail & Related papers (2024-04-26T20:27:10Z)
Yell At Your Robot: Improving On-the-Fly from Language Corrections [84.09578841663195]
We show that high-level policies can be readily supervised with human feedback in the form of language corrections. This framework enables robots not only to rapidly adapt to real-time language feedback, but also incorporate this feedback into an iterative training scheme.
arXiv Detail & Related papers (2024-03-19T17:08:24Z)
Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents [99.17668730578586]
Pre-trained large language models (LLMs) capture procedural knowledge about the world. Plan, Eliminate, and Track (PET) framework translates a task description into a list of high-level sub-tasks. PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
arXiv Detail & Related papers (2023-05-03T20:11:22Z)
Bi-Manual Block Assembly via Sim-to-Real Reinforcement Learning [24.223788665601678]
Two xArm6 robots solve the U-shape assembly task with a success rate of above90% in simulation, and 50% on real hardware without any additional real-world fine-tuning. Our results present a significant step forward for bi-arm capability on real hardware, and we hope our system can inspire future research on deep RL and Sim2Real transfer bi-manualpolicies.
arXiv Detail & Related papers (2023-03-27T01:25:24Z)
MetaMorph: Learning Universal Controllers with Transformers [45.478223199658785]
In robotics we primarily train a single robot for a single task. modular robot systems now allow for the flexible combination of general-purpose building blocks into task optimized morphologies. We propose MetaMorph, a Transformer based approach to learn a universal controller over a modular robot design space.
arXiv Detail & Related papers (2022-03-22T17:58:31Z)
Transformer-based deep imitation learning for dual-arm robot manipulation [5.3022775496405865]
In a dual-arm manipulation setup, the increased number of state dimensions caused by the additional robot manipulators causes distractions. We address this issue using a self-attention mechanism that computes dependencies between elements in a sequential input and focuses on important elements. A Transformer, a variant of self-attention architecture, is applied to deep imitation learning to solve dual-arm manipulation tasks in the real world.
arXiv Detail & Related papers (2021-08-01T07:42:39Z)
Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query. Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories. We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z)
UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks. Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy. The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)
ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation [99.2543521972137]
ReLMoGen is a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals. Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments. ReLMoGen shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.
arXiv Detail & Related papers (2020-08-18T08:05:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.