Skill Transformer: A Monolithic Policy for Mobile Manipulation
- URL: http://arxiv.org/abs/2308.09873v1
- Date: Sat, 19 Aug 2023 01:37:41 GMT
- Title: Skill Transformer: A Monolithic Policy for Mobile Manipulation
- Authors: Xiaoyu Huang, Dhruv Batra, Akshara Rai, Andrew Szot
- Abstract summary: We present Skill Transformer, an approach for solving long-horizon robotic tasks by combining conditional sequence modeling and skill modularity.
Conditioned on egocentric and proprioceptive observations of a robot, Skill Transformer is trained end-to-end to predict both a high-level skill and a whole-body low-level action.
We test Skill Transformer on an embodied benchmark and find it performs robust task planning and low-level control in new scenarios, achieving a 2.5x higher success rate than baselines in hard rearrangement problems.
- Score: 36.18813073796717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Skill Transformer, an approach for solving long-horizon robotic
tasks by combining conditional sequence modeling and skill modularity.
Conditioned on egocentric and proprioceptive observations of a robot, Skill
Transformer is trained end-to-end to predict both a high-level skill (e.g.,
navigation, picking, placing), and a whole-body low-level action (e.g., base
and arm motion), using a transformer architecture and demonstration
trajectories that solve the full task. It retains the composability and
modularity of the overall task through a skill predictor module while reasoning
about low-level actions and avoiding hand-off errors, common in modular
approaches. We test Skill Transformer on an embodied rearrangement benchmark
and find it performs robust task planning and low-level control in new
scenarios, achieving a 2.5x higher success rate than baselines in hard
rearrangement problems.
Related papers
- MeMo: Meaningful, Modular Controllers via Noise Injection [25.541496793132183]
We show that when a new robot is built from the same parts, its control can be quickly learned by reusing the modular controllers.
We achieve this with a framework called MeMo which learns (Me)aningful, (Mo)dular controllers.
We benchmark our framework in locomotion and grasping environments on simple to complex robot morphology transfer.
arXiv Detail & Related papers (2024-05-24T18:39:20Z) - Generalize by Touching: Tactile Ensemble Skill Transfer for Robotic Furniture Assembly [24.161856591498825]
Tactile Ensemble Skill Transfer (TEST) is a pioneering offline reinforcement learning (RL) approach that incorporates tactile feedback in the control loop.
TEST's core design is to learn a skill transition model for high-level planning, along with a set of adaptive intra-skill goal-reaching policies.
Results indicate that TEST can achieve a success rate of 90% and is over 4 times more efficient than the generalization policy.
arXiv Detail & Related papers (2024-04-26T20:27:10Z) - Yell At Your Robot: Improving On-the-Fly from Language Corrections [84.09578841663195]
We show that high-level policies can be readily supervised with human feedback in the form of language corrections.
This framework enables robots not only to rapidly adapt to real-time language feedback, but also incorporate this feedback into an iterative training scheme.
arXiv Detail & Related papers (2024-03-19T17:08:24Z) - Plan, Eliminate, and Track -- Language Models are Good Teachers for
Embodied Agents [99.17668730578586]
Pre-trained large language models (LLMs) capture procedural knowledge about the world.
Plan, Eliminate, and Track (PET) framework translates a task description into a list of high-level sub-tasks.
PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
arXiv Detail & Related papers (2023-05-03T20:11:22Z) - Bi-Manual Block Assembly via Sim-to-Real Reinforcement Learning [24.223788665601678]
Two xArm6 robots solve the U-shape assembly task with a success rate of above90% in simulation, and 50% on real hardware without any additional real-world fine-tuning.
Our results present a significant step forward for bi-arm capability on real hardware, and we hope our system can inspire future research on deep RL and Sim2Real transfer bi-manualpolicies.
arXiv Detail & Related papers (2023-03-27T01:25:24Z) - MetaMorph: Learning Universal Controllers with Transformers [45.478223199658785]
In robotics we primarily train a single robot for a single task.
modular robot systems now allow for the flexible combination of general-purpose building blocks into task optimized morphologies.
We propose MetaMorph, a Transformer based approach to learn a universal controller over a modular robot design space.
arXiv Detail & Related papers (2022-03-22T17:58:31Z) - Transformer-based deep imitation learning for dual-arm robot
manipulation [5.3022775496405865]
In a dual-arm manipulation setup, the increased number of state dimensions caused by the additional robot manipulators causes distractions.
We address this issue using a self-attention mechanism that computes dependencies between elements in a sequential input and focuses on important elements.
A Transformer, a variant of self-attention architecture, is applied to deep imitation learning to solve dual-arm manipulation tasks in the real world.
arXiv Detail & Related papers (2021-08-01T07:42:39Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z) - ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for
Mobile Manipulation [99.2543521972137]
ReLMoGen is a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals.
Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments.
ReLMoGen shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.
arXiv Detail & Related papers (2020-08-18T08:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.