Grounding Robot Policies with Visuomotor Language Guidance
- URL: http://arxiv.org/abs/2410.06473v2
- Date: Thu, 10 Oct 2024 04:03:16 GMT
- Title: Grounding Robot Policies with Visuomotor Language Guidance
- Authors: Arthur Bucker, Pablo Ortega-Kral, Jonathan Francis, Jean Oh,
- Abstract summary: We propose an agent-based framework for grounding robot policies to the current context.
The proposed framework is composed of a set of conversational agents designed for specific roles.
We demonstrate that our approach can effectively guide manipulation policies to achieve significantly higher success rates.
- Score: 15.774237279917594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in the fields of natural language processing and computer vision have shown great potential in understanding the underlying dynamics of the world from large-scale internet data. However, translating this knowledge into robotic systems remains an open challenge, given the scarcity of human-robot interactions and the lack of large-scale datasets of real-world robotic data. Previous robot learning approaches such as behavior cloning and reinforcement learning have shown great capabilities in learning robotic skills from human demonstrations or from scratch in specific environments. However, these approaches often require task-specific demonstrations or designing complex simulation environments, which limits the development of generalizable and robust policies for new settings. Aiming to address these limitations, we propose an agent-based framework for grounding robot policies to the current context, considering the constraints of a current robot and its environment using visuomotor-grounded language guidance. The proposed framework is composed of a set of conversational agents designed for specific roles -- namely, high-level advisor, visual grounding, monitoring, and robotic agents. Given a base policy, the agents collectively generate guidance at run time to shift the action distribution of the base policy towards more desirable future states. We demonstrate that our approach can effectively guide manipulation policies to achieve significantly higher success rates both in simulation and in real-world experiments without the need for additional human demonstrations or extensive exploration. Project videos at https://sites.google.com/view/motorcortex/home.
Related papers
- $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - Robotic Control via Embodied Chain-of-Thought Reasoning [86.6680905262442]
Key limitation of learned robot control policies is their inability to generalize outside their training data.
Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models can substantially improve their robustness and generalization ability.
We introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features before predicting the robot action.
arXiv Detail & Related papers (2024-07-11T17:31:01Z) - Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions.
Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision.
We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z) - A Survey on Robotics with Foundation Models: toward Embodied AI [30.999414445286757]
Recent advances in computer vision, natural language processing, and multi-modality learning have shown that the foundation models have superhuman capabilities for specific tasks.
This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control.
arXiv Detail & Related papers (2024-02-04T07:55:01Z) - RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [68.70755196744533]
RoboGen is a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation.
Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics.
arXiv Detail & Related papers (2023-11-02T17:59:21Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Learning Video-Conditioned Policies for Unseen Manipulation Tasks [83.2240629060453]
Video-conditioned Policy learning maps human demonstrations of previously unseen tasks to robot manipulation skills.
We learn our policy to generate appropriate actions given current scene observations and a video of the target task.
We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art.
arXiv Detail & Related papers (2023-05-10T16:25:42Z) - Dual-Arm Adversarial Robot Learning [0.6091702876917281]
We propose dual-arm settings as platforms for robot learning.
We will discuss the potential benefits of this setup as well as the challenges and research directions that can be pursued.
arXiv Detail & Related papers (2021-10-15T12:51:57Z) - Low Dimensional State Representation Learning with Robotics Priors in
Continuous Action Spaces [8.692025477306212]
Reinforcement learning algorithms have proven to be capable of solving complicated robotics tasks in an end-to-end fashion.
We propose a framework combining the learning of a low-dimensional state representation, from high-dimensional observations coming from the robot's raw sensory readings, with the learning of the optimal policy.
arXiv Detail & Related papers (2021-07-04T15:42:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.