HomeRobot: Open-Vocabulary Mobile Manipulation
- URL: http://arxiv.org/abs/2306.11565v2
- Date: Wed, 10 Jan 2024 14:20:30 GMT
- Title: HomeRobot: Open-Vocabulary Mobile Manipulation
- Authors: Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin Wang,
Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander William
Clegg, John Turner, Zsolt Kira, Manolis Savva, Angel Chang, Devendra Singh
Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton
- Abstract summary: Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location.
HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch.
- Score: 107.05702777141178
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: HomeRobot (noun): An affordable compliant robot that navigates homes and
manipulates a wide range of objects in order to complete everyday tasks.
Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object
in any unseen environment, and placing it in a commanded location. This is a
foundational challenge for robots to be useful assistants in human
environments, because it involves tackling sub-problems from across robotics:
perception, language understanding, navigation, and manipulation are all
essential to OVMM. In addition, integration of the solutions to these
sub-problems poses its own substantial challenges. To drive research in this
area, we introduce the HomeRobot OVMM benchmark, where an agent navigates
household environments to grasp novel objects and place them on target
receptacles. HomeRobot has two components: a simulation component, which uses a
large and diverse curated object set in new, high-quality multi-room home
environments; and a real-world component, providing a software stack for the
low-cost Hello Robot Stretch to encourage replication of real-world experiments
across labs. We implement both reinforcement learning and heuristic
(model-based) baselines and show evidence of sim-to-real transfer. Our
baselines achieve a 20% success rate in the real world; our experiments
identify ways future research work improve performance. See videos on our
website: https://ovmm.github.io/.
Related papers
- GRUtopia: Dream General Robots in a City at Scale [65.08318324604116]
This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots.
GRScenes includes 100k interactive, finely annotated scenes, which can be freely combined into city-scale environments.
GRResidents is a Large Language Model (LLM) driven Non-Player Character (NPC) system that is responsible for social interaction.
arXiv Detail & Related papers (2024-07-15T17:40:46Z) - RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots [25.650235551519952]
We present RoboCasa, a large-scale simulation framework for training generalist robots in everyday environments.
We provide thousands of 3D assets across over 150 object categories and dozens of interactable furniture and appliances.
Our experiments show a clear scaling trend in using synthetically generated robot data for large-scale imitation learning.
arXiv Detail & Related papers (2024-06-04T17:41:31Z) - Dynamic Handover: Throw and Catch with Bimanual Hands [30.206469112964033]
We design a system with two multi-finger hands attached to robot arms to solve this problem.
We train our system using Multi-Agent Reinforcement Learning in simulation and perform Sim2Real transfer to deploy on the real robots.
To overcome the Sim2Real gap, we provide multiple novel algorithm designs including learning a trajectory prediction model for the object.
arXiv Detail & Related papers (2023-09-11T17:49:25Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - Learning Hierarchical Interactive Multi-Object Search for Mobile
Manipulation [10.21450780640562]
We introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects.
These new challenges require combining manipulation and navigation skills in unexplored environments.
We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills.
arXiv Detail & Related papers (2023-07-12T12:25:33Z) - RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in
One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.
Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations.
This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv Detail & Related papers (2023-07-02T15:33:31Z) - ALAN: Autonomously Exploring Robotic Agents in the Real World [28.65531878636441]
ALAN is an autonomously exploring robotic agent that can perform tasks in the real world with little training and interaction time.
This is enabled by measuring environment change, which reflects object movement and ignores changes in the robot position.
We evaluate our approach on two different real-world play kitchen settings, enabling a robot to efficiently explore and discover manipulation skills.
arXiv Detail & Related papers (2023-02-13T18:59:09Z) - Sim2Real for Peg-Hole Insertion with Eye-in-Hand Camera [58.720142291102135]
We use a simulator to learn the peg-hole insertion problem and then transfer the learned model to the real robot.
We show that the transferred policy, which only takes RGB-D and joint information (proprioception) can perform well on the real robot.
arXiv Detail & Related papers (2020-05-29T05:58:54Z) - SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects.
We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.