Related papers: Generalized Object Search

Generalized Object Search

URL: http://arxiv.org/abs/2301.10121v2
Date: Thu, 4 May 2023 04:02:15 GMT
Title: Generalized Object Search
Authors: Kaiyu Zheng
Abstract summary: This thesis develops methods and systems for (multi-)object search in 3D environments under uncertainty. I implement a robot-independent, environment-agnostic system for generalized object search in 3D. I deploy it on the Boston Dynamics Spot robot, the Kinova MOVO robot, and the Universal Robots UR5e robotic arm.
Score: 0.9137554315375919
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Future collaborative robots must be capable of finding objects. As such a fundamental skill, we expect object search to eventually become an off-the-shelf capability for any robot, similar to e.g., object detection, SLAM, and motion planning. However, existing approaches either make unrealistic compromises (e.g., reduce the problem from 3D to 2D), resort to ad-hoc, greedy search strategies, or attempt to learn end-to-end policies in simulation that are yet to generalize across real robots and environments. This thesis argues that through using Partially Observable Markov Decision Processes (POMDPs) to model object search while exploiting structures in the human world (e.g., octrees, correlations) and in human-robot interaction (e.g., spatial language), a practical and effective system for generalized object search can be achieved. In support of this argument, I develop methods and systems for (multi-)object search in 3D environments under uncertainty due to limited field of view, occlusion, noisy, unreliable detectors, spatial correlations between objects, and possibly ambiguous spatial language (e.g., "The red car is behind Chase Bank"). Besides evaluation in simulators such as PyGame, AirSim, and AI2-THOR, I design and implement a robot-independent, environment-agnostic system for generalized object search in 3D and deploy it on the Boston Dynamics Spot robot, the Kinova MOVO robot, and the Universal Robots UR5e robotic arm, to perform object search in different environments. The system enables, for example, a Spot robot to find a toy cat hidden underneath a couch in a kitchen area in under one minute. This thesis also broadly surveys the object search literature, proposing taxonomies in object search problem settings, methods and systems.

Related papers

MLLM-Search: A Zero-Shot Approach to Finding People using Multimodal Large Language Models [5.28115111932163]
We present MLLM-Search, a novel zero-shot person search architecture for mobile robots. Our approach introduces a novel visual prompting method to provide robots with spatial understanding of the environment. Experiments with a mobile robot in a multi-room floor of a building showed that MLLM-Search was able to generalize to finding a person in a new unseen environment.
arXiv Detail & Related papers (2024-11-27T21:59:29Z)
Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction [52.12746368727368]
Differentiable simulation has become a powerful tool for system identification. Our approach calibrates object properties by using information from the robot, without relying on data from the object itself. We demonstrate the effectiveness of our method on a low-cost robotic platform.
arXiv Detail & Related papers (2024-10-04T20:48:38Z)
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision. We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z)
Teaching Unknown Objects by Leveraging Human Gaze and Augmented Reality in Human-Robot Interaction [3.1473798197405953]
This dissertation aims to teach a robot unknown objects in the context of Human-Robot Interaction (HRI) The combination of eye tracking and Augmented Reality created a powerful synergy that empowered the human teacher to communicate with the robot. The robot's object detection capabilities exhibited comparable performance to state-of-the-art object detectors trained on extensive datasets.
arXiv Detail & Related papers (2023-12-12T11:34:43Z)
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system. We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration. We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z)
Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation [10.21450780640562]
We introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills.
arXiv Detail & Related papers (2023-07-12T12:25:33Z)
HomeRobot: Open-Vocabulary Mobile Manipulation [107.05702777141178]
Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch.
arXiv Detail & Related papers (2023-06-20T14:30:32Z)
A System for Generalized 3D Multi-Object Search [10.40566214112389]
GenMOS is a general-purpose system for multi-object search in a 3D region that is robot-independent and environment-agnostic. Our system enables, for example, a Boston Dynamics Spot robot to find a toy cat hidden underneath a couch in under one minute.
arXiv Detail & Related papers (2023-03-06T14:47:38Z)
Extracting Zero-shot Common Sense from Large Language Models for Robot 3D Scene Understanding [25.270772036342688]
We introduce a novel method for leveraging common sense embedded within large language models for labelling rooms. The proposed algorithm operates on 3D scene graphs produced by modern spatial perception systems.
arXiv Detail & Related papers (2022-06-09T16:05:35Z)
Reasoning with Scene Graphs for Robot Planning under Partial Observability [7.121002367542985]
We develop an algorithm called scene analysis for robot planning (SARP) that enables robots to reason with visual contextual information. Experiments have been conducted using multiple 3D environments in simulation, and a dataset collected by a real robot.
arXiv Detail & Related papers (2022-02-21T18:45:56Z)
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter. We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping. We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z)
SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects. We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)
CRAVES: Controlling Robotic Arm with a Vision-based Economic System [96.56564257199474]
Training a robotic arm to accomplish real-world tasks has been attracting increasing attention in both academia and industry.<n>This work discusses the role of computer vision algorithms in this field.<n>We present an alternative solution, which uses a 3D model to create a large number of synthetic data.
arXiv Detail & Related papers (2018-12-03T13:28:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.