Generalized Object Search
- URL: http://arxiv.org/abs/2301.10121v2
- Date: Thu, 4 May 2023 04:02:15 GMT
- Title: Generalized Object Search
- Authors: Kaiyu Zheng
- Abstract summary: This thesis develops methods and systems for (multi-)object search in 3D environments under uncertainty.
I implement a robot-independent, environment-agnostic system for generalized object search in 3D.
I deploy it on the Boston Dynamics Spot robot, the Kinova MOVO robot, and the Universal Robots UR5e robotic arm.
- Score: 0.9137554315375919
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Future collaborative robots must be capable of finding objects. As such a
fundamental skill, we expect object search to eventually become an
off-the-shelf capability for any robot, similar to e.g., object detection,
SLAM, and motion planning. However, existing approaches either make unrealistic
compromises (e.g., reduce the problem from 3D to 2D), resort to ad-hoc, greedy
search strategies, or attempt to learn end-to-end policies in simulation that
are yet to generalize across real robots and environments. This thesis argues
that through using Partially Observable Markov Decision Processes (POMDPs) to
model object search while exploiting structures in the human world (e.g.,
octrees, correlations) and in human-robot interaction (e.g., spatial language),
a practical and effective system for generalized object search can be achieved.
In support of this argument, I develop methods and systems for (multi-)object
search in 3D environments under uncertainty due to limited field of view,
occlusion, noisy, unreliable detectors, spatial correlations between objects,
and possibly ambiguous spatial language (e.g., "The red car is behind Chase
Bank"). Besides evaluation in simulators such as PyGame, AirSim, and AI2-THOR,
I design and implement a robot-independent, environment-agnostic system for
generalized object search in 3D and deploy it on the Boston Dynamics Spot
robot, the Kinova MOVO robot, and the Universal Robots UR5e robotic arm, to
perform object search in different environments. The system enables, for
example, a Spot robot to find a toy cat hidden underneath a couch in a kitchen
area in under one minute. This thesis also broadly surveys the object search
literature, proposing taxonomies in object search problem settings, methods and
systems.
Related papers
- Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction [52.12746368727368]
Differentiable simulation has become a powerful tool for system identification.
Our approach calibrates object properties by using information from the robot, without relying on data from the object itself.
We demonstrate the effectiveness of our method on a low-cost robotic platform.
arXiv Detail & Related papers (2024-10-04T20:48:38Z) - Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions.
Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision.
We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z) - Teaching Unknown Objects by Leveraging Human Gaze and Augmented Reality
in Human-Robot Interaction [3.1473798197405953]
This dissertation aims to teach a robot unknown objects in the context of Human-Robot Interaction (HRI)
The combination of eye tracking and Augmented Reality created a powerful synergy that empowered the human teacher to communicate with the robot.
The robot's object detection capabilities exhibited comparable performance to state-of-the-art object detectors trained on extensive datasets.
arXiv Detail & Related papers (2023-12-12T11:34:43Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - Learning Hierarchical Interactive Multi-Object Search for Mobile
Manipulation [10.21450780640562]
We introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects.
These new challenges require combining manipulation and navigation skills in unexplored environments.
We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills.
arXiv Detail & Related papers (2023-07-12T12:25:33Z) - HomeRobot: Open-Vocabulary Mobile Manipulation [107.05702777141178]
Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location.
HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch.
arXiv Detail & Related papers (2023-06-20T14:30:32Z) - A System for Generalized 3D Multi-Object Search [10.40566214112389]
GenMOS is a general-purpose system for multi-object search in a 3D region that is robot-independent and environment-agnostic.
Our system enables, for example, a Boston Dynamics Spot robot to find a toy cat hidden underneath a couch in under one minute.
arXiv Detail & Related papers (2023-03-06T14:47:38Z) - Extracting Zero-shot Common Sense from Large Language Models for Robot
3D Scene Understanding [25.270772036342688]
We introduce a novel method for leveraging common sense embedded within large language models for labelling rooms.
The proposed algorithm operates on 3D scene graphs produced by modern spatial perception systems.
arXiv Detail & Related papers (2022-06-09T16:05:35Z) - Reasoning with Scene Graphs for Robot Planning under Partial
Observability [7.121002367542985]
We develop an algorithm called scene analysis for robot planning (SARP) that enables robots to reason with visual contextual information.
Experiments have been conducted using multiple 3D environments in simulation, and a dataset collected by a real robot.
arXiv Detail & Related papers (2022-02-21T18:45:56Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects.
We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.