Related papers: Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation

Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation

URL: http://arxiv.org/abs/2307.06125v3
Date: Thu, 19 Oct 2023 12:14:46 GMT
Title: Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation
Authors: Fabian Schmalstieg, Daniel Honerkamp, Tim Welschehold, Abhinav Valada
Abstract summary: We introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills.
Score: 10.21450780640562
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing object-search approaches enable robots to search through free pathways, however, robots operating in unstructured human-centered environments frequently also have to manipulate the environment to their needs. In this work, we introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills. To achieve this, we design an abstract high-level action space around a semantic map memory and leverage the explored environment as instance navigation points. We perform extensive experiments in simulation and the real world that demonstrate that, with accurate perception, the decision making of HIMOS effectively transfers to new environments in a zero-shot manner. It shows robustness to unseen subpolicies, failures in their execution, and different robot kinematics. These capabilities open the door to a wide range of downstream tasks across embodied AI and real-world use cases.

Related papers

LOVON: Legged Open-Vocabulary Object Navigator [9.600429521100041]
We propose a novel framework that integrates large language models for hierarchical task planning with open-vocabulary visual detection models.<n>To tackle real-world challenges including visual jittering, blind zones, and temporary target loss, we design dedicated solutions.<n>We also develop a functional execution logic for the robot that guarantees LOVON's capabilities in autonomous navigation, task adaptation, and robust task completion.
arXiv Detail & Related papers (2025-07-09T11:02:46Z)
CuriousBot: Interactive Mobile Exploration via Actionable 3D Relational Object Graph [12.54884302440877]
Mobile exploration is a longstanding challenge in robotics. Existing robotic exploration approaches via active interaction are often restricted to tabletop scenes. We introduce a 3D relational object graph that encodes diverse object relations and enables exploration through active interaction.
arXiv Detail & Related papers (2025-01-23T02:39:04Z)
Affordance Perception by a Knowledge-Guided Vision-Language Model with Efficient Error Correction [0.0]
We provide an affordance representation with precise, actionable affordances for a robot in an open-world setting. We connect this knowledge base to a foundational vision-language models (VLM) and prompt the VLM for a wider variety of new and unseen objects. The mix of affordance representation, image detection and a human-in-the-loop is effective for a robot to search for objects to achieve its goals.
arXiv Detail & Related papers (2024-07-18T10:24:22Z)
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision. We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z)
Growing from Exploration: A self-exploring framework for robots based on foundation models [13.250831101705694]
We propose a framework named GExp, which enables robots to explore and learn autonomously without human intervention. Inspired by the way that infants interact with the world, GExp encourages robots to understand and explore the environment with a series of self-generated tasks.
arXiv Detail & Related papers (2024-01-24T14:04:08Z)
Target Search and Navigation in Heterogeneous Robot Systems with Deep Reinforcement Learning [3.3167319223959373]
We design a heterogeneous robot system consisting of a UAV and a UGV for search and rescue missions in unknown environments. The system is able to search for targets and navigate to them in a maze-like mine environment with the policies learned through deep reinforcement learning algorithms.
arXiv Detail & Related papers (2023-08-01T07:09:14Z)
HomeRobot: Open-Vocabulary Mobile Manipulation [107.05702777141178]
Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch.
arXiv Detail & Related papers (2023-06-20T14:30:32Z)
Generalized Object Search [0.9137554315375919]
This thesis develops methods and systems for (multi-)object search in 3D environments under uncertainty. I implement a robot-independent, environment-agnostic system for generalized object search in 3D. I deploy it on the Boston Dynamics Spot robot, the Kinova MOVO robot, and the Universal Robots UR5e robotic arm.
arXiv Detail & Related papers (2023-01-24T16:41:36Z)
ReLMM: Practical RL for Learning Mobile Manipulation Skills Using Only Onboard Sensors [64.2809875343854]
We study how robots can autonomously learn skills that require a combination of navigation and grasping. Our system, ReLMM, can learn continuously on a real-world platform without any environment instrumentation. After a grasp curriculum training phase, ReLMM can learn navigation and grasping together fully automatically, in around 40 hours of real-world training.
arXiv Detail & Related papers (2021-07-28T17:59:41Z)
Rapid Exploration for Open-World Navigation with Latent Goal Models [78.45339342966196]
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments. At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images. We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration.
arXiv Detail & Related papers (2021-04-12T23:14:41Z)
ViNG: Learning Open-World Navigation with Visual Goals [82.84193221280216]
We propose a learning-based navigation system for reaching visually indicated goals. We show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning. We demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection.
arXiv Detail & Related papers (2020-12-17T18:22:32Z)
SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects. We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.