Learning Hierarchical Interactive Multi-Object Search for Mobile
Manipulation
- URL: http://arxiv.org/abs/2307.06125v3
- Date: Thu, 19 Oct 2023 12:14:46 GMT
- Title: Learning Hierarchical Interactive Multi-Object Search for Mobile
Manipulation
- Authors: Fabian Schmalstieg, Daniel Honerkamp, Tim Welschehold, Abhinav Valada
- Abstract summary: We introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects.
These new challenges require combining manipulation and navigation skills in unexplored environments.
We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills.
- Score: 10.21450780640562
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing object-search approaches enable robots to search through free
pathways, however, robots operating in unstructured human-centered environments
frequently also have to manipulate the environment to their needs. In this
work, we introduce a novel interactive multi-object search task in which a
robot has to open doors to navigate rooms and search inside cabinets and
drawers to find target objects. These new challenges require combining
manipulation and navigation skills in unexplored environments. We present
HIMOS, a hierarchical reinforcement learning approach that learns to compose
exploration, navigation, and manipulation skills. To achieve this, we design an
abstract high-level action space around a semantic map memory and leverage the
explored environment as instance navigation points. We perform extensive
experiments in simulation and the real world that demonstrate that, with
accurate perception, the decision making of HIMOS effectively transfers to new
environments in a zero-shot manner. It shows robustness to unseen subpolicies,
failures in their execution, and different robot kinematics. These capabilities
open the door to a wide range of downstream tasks across embodied AI and
real-world use cases.
Related papers
- Affordance Perception by a Knowledge-Guided Vision-Language Model with Efficient Error Correction [0.0]
We provide an affordance representation with precise, actionable affordances for a robot in an open-world setting.
We connect this knowledge base to a foundational vision-language models (VLM) and prompt the VLM for a wider variety of new and unseen objects.
The mix of affordance representation, image detection and a human-in-the-loop is effective for a robot to search for objects to achieve its goals.
arXiv Detail & Related papers (2024-07-18T10:24:22Z) - Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions.
Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision.
We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z) - Growing from Exploration: A self-exploring framework for robots based on
foundation models [13.250831101705694]
We propose a framework named GExp, which enables robots to explore and learn autonomously without human intervention.
Inspired by the way that infants interact with the world, GExp encourages robots to understand and explore the environment with a series of self-generated tasks.
arXiv Detail & Related papers (2024-01-24T14:04:08Z) - Target Search and Navigation in Heterogeneous Robot Systems with Deep
Reinforcement Learning [3.3167319223959373]
We design a heterogeneous robot system consisting of a UAV and a UGV for search and rescue missions in unknown environments.
The system is able to search for targets and navigate to them in a maze-like mine environment with the policies learned through deep reinforcement learning algorithms.
arXiv Detail & Related papers (2023-08-01T07:09:14Z) - HomeRobot: Open-Vocabulary Mobile Manipulation [107.05702777141178]
Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location.
HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch.
arXiv Detail & Related papers (2023-06-20T14:30:32Z) - Generalized Object Search [0.9137554315375919]
This thesis develops methods and systems for (multi-)object search in 3D environments under uncertainty.
I implement a robot-independent, environment-agnostic system for generalized object search in 3D.
I deploy it on the Boston Dynamics Spot robot, the Kinova MOVO robot, and the Universal Robots UR5e robotic arm.
arXiv Detail & Related papers (2023-01-24T16:41:36Z) - ReLMM: Practical RL for Learning Mobile Manipulation Skills Using Only
Onboard Sensors [64.2809875343854]
We study how robots can autonomously learn skills that require a combination of navigation and grasping.
Our system, ReLMM, can learn continuously on a real-world platform without any environment instrumentation.
After a grasp curriculum training phase, ReLMM can learn navigation and grasping together fully automatically, in around 40 hours of real-world training.
arXiv Detail & Related papers (2021-07-28T17:59:41Z) - Rapid Exploration for Open-World Navigation with Latent Goal Models [78.45339342966196]
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments.
At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images.
We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration.
arXiv Detail & Related papers (2021-04-12T23:14:41Z) - ViNG: Learning Open-World Navigation with Visual Goals [82.84193221280216]
We propose a learning-based navigation system for reaching visually indicated goals.
We show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning.
We demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection.
arXiv Detail & Related papers (2020-12-17T18:22:32Z) - SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects.
We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.