Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost
Whole-Body Teleoperation
- URL: http://arxiv.org/abs/2401.02117v1
- Date: Thu, 4 Jan 2024 07:55:53 GMT
- Title: Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost
Whole-Body Teleoperation
- Authors: Zipeng Fu, Tony Z. Zhao, Chelsea Finn
- Abstract summary: We develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control.
Mobile ALOHA is a low-cost and whole-body teleoperation system for data collection.
Co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex mobile manipulation tasks.
- Score: 59.21899709023333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning from human demonstrations has shown impressive performance
in robotics. However, most results focus on table-top manipulation, lacking the
mobility and dexterity necessary for generally useful tasks. In this work, we
develop a system for imitating mobile manipulation tasks that are bimanual and
require whole-body control. We first present Mobile ALOHA, a low-cost and
whole-body teleoperation system for data collection. It augments the ALOHA
system with a mobile base, and a whole-body teleoperation interface. Using data
collected with Mobile ALOHA, we then perform supervised behavior cloning and
find that co-training with existing static ALOHA datasets boosts performance on
mobile manipulation tasks. With 50 demonstrations for each task, co-training
can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously
complete complex mobile manipulation tasks such as sauteing and serving a piece
of shrimp, opening a two-door wall cabinet to store heavy cooking pots, calling
and entering an elevator, and lightly rinsing a used pan using a kitchen
faucet. Project website: https://mobile-aloha.github.io
Related papers
- BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-wide Mobile Manipulation [36.21945470191491]
We introduce BUMBLE, a unified Vision-Language Model (VLM)-based framework integrating open-world RGBD perception, a wide spectrum of gross-to-fine motor skills, and dual-layered memory.
BUMBLE achieves 47.1% success rate averaged over 70 trials in different buildings, tasks, and scene layouts from different starting rooms and floors.
arXiv Detail & Related papers (2024-10-08T17:52:29Z) - Zero-Cost Whole-Body Teleoperation for Mobile Manipulation [8.71539730969424]
MoMa-Teleop is a novel teleoperation method that delegates the base motions to a reinforcement learning agent.
We demonstrate that our approach results in a significant reduction in task completion time across a variety of robots and tasks.
arXiv Detail & Related papers (2024-09-23T15:09:45Z) - Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition [48.65867987106428]
We introduce a novel system for joint learning between human operators and robots.
It enables human operators to share control of a robot end-effector with a learned assistive agent.
It reduces the need for human adaptation while ensuring the collected data is of sufficient quality for downstream tasks.
arXiv Detail & Related papers (2024-06-29T03:37:29Z) - Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration [52.25473993987409]
We propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance.
The architecture comprises three agents: planning agent, decision agent, and reflection agent.
We show that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture.
arXiv Detail & Related papers (2024-06-03T05:50:00Z) - TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation [38.187217710937084]
In this work, we demonstrate TeleMoMa, a general and modular interface for whole-body teleoperation of mobile manipulators.
TeleMoMa unifies multiple human interfaces including RGB and depth cameras, virtual reality controllers, keyboard, joysticks, etc.
arXiv Detail & Related papers (2024-03-12T17:58:01Z) - Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception [52.5831204440714]
We introduce Mobile-Agent, an autonomous multi-modal mobile device agent.
Mobile-Agent first leverages visual perception tools to accurately identify and locate both the visual and textual elements within the app's front-end interface.
It then autonomously plans and decomposes the complex operation task, and navigates the mobile Apps through operations step by step.
arXiv Detail & Related papers (2024-01-29T13:46:37Z) - Human-in-the-Loop Task and Motion Planning for Imitation Learning [37.75197145733193]
Imitation learning from human demonstrations can teach robots complex manipulation skills, but is time-consuming and labor intensive.
In contrast, Task and Motion Planning (TAMP) systems are automated and excel at solving long-horizon tasks.
We present Human-in-the-Loop Task and Motion Planning (HITL-TAMP), a novel system that leverages the benefits of both approaches.
arXiv Detail & Related papers (2023-10-24T17:15:16Z) - Error-Aware Imitation Learning from Teleoperation Data for Mobile
Manipulation [54.31414116478024]
In mobile manipulation (MM), robots can both navigate within and interact with their environment.
In this work, we explore how to apply imitation learning (IL) to learn continuous visuo-motor policies for MM tasks.
arXiv Detail & Related papers (2021-12-09T23:54:59Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.