Related papers: Whole-Body Teleoperation for Mobile Manipulation at Zero Added Cost

Whole-Body Teleoperation for Mobile Manipulation at Zero Added Cost

URL: http://arxiv.org/abs/2409.15095v2
Date: Mon, 10 Feb 2025 10:50:14 GMT
Title: Whole-Body Teleoperation for Mobile Manipulation at Zero Added Cost
Authors: Daniel Honerkamp, Harsh Mahesheka, Jan Ole von Hartz, Tim Welschehold, Abhinav Valada,
Abstract summary: MoMa-Teleop is a novel teleoperation method that infers end-effector motions from existing interfaces.<n>We demonstrate that our approach results in a significant reduction in task completion time across a variety of robots and tasks.
Score: 8.71539730969424
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Demonstration data plays a key role in learning complex behaviors and training robotic foundation models. While effective control interfaces exist for static manipulators, data collection remains cumbersome and time intensive for mobile manipulators due to their large number of degrees of freedom. While specialized hardware, avatars, or motion tracking can enable whole-body control, these approaches are either expensive, robot-specific, or suffer from the embodiment mismatch between robot and human demonstrator. In this work, we present MoMa-Teleop, a novel teleoperation method that infers end-effector motions from existing interfaces and delegates the base motions to a previously developed reinforcement learning agent, leaving the operator to focus fully on the task-relevant end-effector motions. This enables whole-body teleoperation of mobile manipulators with no additional hardware or setup costs via standard interfaces such as joysticks or hand guidance. Moreover, the operator is not bound to a tracked workspace and can move freely with the robot over spatially extended tasks. We demonstrate that our approach results in a significant reduction in task completion time across a variety of robots and tasks. As the generated data covers diverse whole-body motions without embodiment mismatch, it enables efficient imitation learning. By focusing on task-specific end-effector motions, our approach learns skills that transfer to unseen settings, such as new obstacles or changed object positions, from as little as five demonstrations. We make code and videos available at https://moma-teleop.cs.uni-freiburg.de.

Related papers

Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models [49.4824734958566]
Chain-of-Modality (CoM) enables Vision Language Models to reason about multimodal human demonstration data. CoM refines a task plan and generates detailed control parameters, enabling robots to perform manipulation tasks based on a single multimodal human video prompt.
arXiv Detail & Related papers (2025-04-17T21:31:23Z)
TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot Learning [33.13259411762365]
This paper proposes an inexpensive, robust, and flexible mobile manipulator that can support arbitrary arms. Powered casters enable the mobile base to be fully holonomic, able to control all planar degrees of freedom independently and simultaneously. We equip our robot with an intuitive mobile phone teleoperation interface to enable easy data acquisition for imitation learning.
arXiv Detail & Related papers (2024-12-11T18:54:22Z)
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos [64.48857272250446]
We introduce Moto, which converts video content into latent Motion Token sequences by a Latent Motion Tokenizer. We pre-train Moto-GPT through motion token autoregression, enabling it to capture diverse visual motion knowledge. To transfer learned motion priors to real robot actions, we implement a co-fine-tuning strategy that seamlessly bridges latent motion token prediction and real robot control.
arXiv Detail & Related papers (2024-12-05T18:57:04Z)
Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition [48.65867987106428]
We introduce a novel system for joint learning between human operators and robots. It enables human operators to share control of a robot end-effector with a learned assistive agent. It reduces the need for human adaptation while ensuring the collected data is of sufficient quality for downstream tasks.
arXiv Detail & Related papers (2024-06-29T03:37:29Z)
Harmonic Mobile Manipulation [35.82197562695662]
HarmonicMM is an end-to-end learning method that optimize both navigation and manipulation. Our contributions include a new benchmark for mobile manipulation and the successful deployment with only RGB visual observation.
arXiv Detail & Related papers (2023-12-11T18:54:42Z)
Learning Video-Conditioned Policies for Unseen Manipulation Tasks [83.2240629060453]
Video-conditioned Policy learning maps human demonstrations of previously unseen tasks to robot manipulation skills. We learn our policy to generate appropriate actions given current scene observations and a video of the target task. We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art.
arXiv Detail & Related papers (2023-05-10T16:25:42Z)
Dexterous Manipulation from Images: Autonomous Real-World RL via Substep Guidance [71.36749876465618]
We describe a system for vision-based dexterous manipulation that provides a "programming-free" approach for users to define new tasks. Our system includes a framework for users to define a final task and intermediate sub-tasks with image examples. experimental results with a four-finger robotic hand learning multi-stage object manipulation tasks directly in the real world.
arXiv Detail & Related papers (2022-12-19T22:50:40Z)
Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z)
N$^2$M$^2$: Learning Navigation for Arbitrary Mobile Manipulation Motions in Unseen and Dynamic Environments [9.079709086741987]
We introduce Neural Navigation for Mobile Manipulation (N$2$M$2$) which extends this decomposition to complex obstacle environments. The resulting approach can perform unseen, long-horizon tasks in unexplored environments while instantly reacting to dynamic obstacles and environmental changes. We demonstrate the capabilities of our proposed approach in extensive simulation and real-world experiments on multiple kinematically diverse mobile manipulators.
arXiv Detail & Related papers (2022-06-17T12:52:41Z)
From One Hand to Multiple Hands: Imitation Learning for Dexterous Manipulation from Single-Camera Teleoperation [26.738893736520364]
We introduce a novel single-camera teleoperation system to collect the 3D demonstrations efficiently with only an iPad and a computer. We construct a customized robot hand for each user in the physical simulator, which is a manipulator resembling the same kinematics structure and shape of the operator's hand. With imitation learning using our data, we show large improvement over baselines with multiple complex manipulation tasks.
arXiv Detail & Related papers (2022-04-26T17:59:51Z)
Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation [55.31301153979621]
We tackle real-world long-horizon robot manipulation tasks through skill discovery. We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations. Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks.
arXiv Detail & Related papers (2021-09-28T16:18:54Z)
Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots. We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector. We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.