Long-Horizon Manipulation of Unknown Objects via Task and Motion
Planning with Estimated Affordances
- URL: http://arxiv.org/abs/2108.04145v2
- Date: Tue, 10 Aug 2021 04:01:30 GMT
- Title: Long-Horizon Manipulation of Unknown Objects via Task and Motion
Planning with Estimated Affordances
- Authors: Aidan Curtis, Xiaolin Fang, Leslie Pack Kaelbling, Tom\'as
Lozano-P\'erez, Caelan Reed Garrett
- Abstract summary: We show that a task-and-motion planner can be used to plan intelligent behaviors even in the absence of a priori knowledge regarding the set of manipulable objects.
We demonstrate that this strategy can enable a single system to perform a wide variety of real-world multi-step manipulation tasks.
- Score: 26.082034134908785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a strategy for designing and building very general robot
manipulation systems involving the integration of a general-purpose
task-and-motion planner with engineered and learned perception modules that
estimate properties and affordances of unknown objects. Such systems are
closed-loop policies that map from RGB images, depth images, and robot joint
encoder measurements to robot joint position commands. We show that following
this strategy a task-and-motion planner can be used to plan intelligent
behaviors even in the absence of a priori knowledge regarding the set of
manipulable objects, their geometries, and their affordances. We explore
several different ways of implementing such perceptual modules for
segmentation, property detection, shape estimation, and grasp generation. We
show how these modules are integrated within the PDDLStream task and motion
planning framework. Finally, we demonstrate that this strategy can enable a
single system to perform a wide variety of real-world multi-step manipulation
tasks, generalizing over a broad class of objects, object arrangements, and
goals, without any prior knowledge of the environment and without re-training.
Related papers
- Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - Representing Positional Information in Generative World Models for Object Manipulation [12.263162194821787]
We introduce a general approach that empowers world model-based agents to solve object-positioning tasks.
In particular, LCP employs object-centric latent representations that explicitly capture object positional information for goal specification.
Our methods are rigorously evaluated across several manipulation environments, showing favorable performance compared to current model-based control approaches.
arXiv Detail & Related papers (2024-09-18T14:19:50Z) - Embodied Instruction Following in Unknown Environments [66.60163202450954]
We propose an embodied instruction following (EIF) method for complex tasks in the unknown environment.
We build a hierarchical embodied instruction following framework including the high-level task planner and the low-level exploration controller.
For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues.
arXiv Detail & Related papers (2024-06-17T17:55:40Z) - Cognitive Planning for Object Goal Navigation using Generative AI Models [0.979851640406258]
We present a novel framework for solving the object goal navigation problem that generates efficient exploration strategies.
Our approach enables a robot to navigate unfamiliar environments by leveraging Large Language Models (LLMs) and Large Vision-Language Models (LVLMs)
arXiv Detail & Related papers (2024-03-30T10:54:59Z) - Embodied Task Planning with Large Language Models [86.63533340293361]
We propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint.
During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations.
Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin.
arXiv Detail & Related papers (2023-07-04T17:58:25Z) - Online Grounding of PDDL Domains by Acting and Sensing in Unknown
Environments [62.11612385360421]
This paper proposes a framework that allows an agent to perform different tasks.
We integrate machine learning models to abstract the sensory data, symbolic planning for goal achievement and path planning for navigation.
We evaluate the proposed method in accurate simulated environments, where the sensors are RGB-D on-board camera, GPS and compass.
arXiv Detail & Related papers (2021-12-18T21:48:20Z) - A Long Horizon Planning Framework for Manipulating Rigid Pointcloud
Objects [25.428781562909606]
We present a framework for solving long-horizon planning problems involving manipulation of rigid objects.
Our method plans in the space of object subgoals and frees the planner from reasoning about robot-object interaction dynamics.
arXiv Detail & Related papers (2020-11-16T18:59:33Z) - Distributed Reinforcement Learning of Targeted Grasping with Active
Vision for Mobile Manipulators [4.317864702902075]
We present the first RL-based system for a mobile manipulator that can (a) achieve targeted grasping generalizing to unseen target objects, (b) learn complex grasping strategies for cluttered scenes with occluded objects, and (c) perform active vision through its movable wrist camera to better locate objects.
We train and evaluate our system in a simulated environment, identify key components for improving performance, analyze its behaviors, and transfer to a real-world setup.
arXiv Detail & Related papers (2020-07-16T02:47:48Z) - Dynamic Feature Integration for Simultaneous Detection of Salient
Object, Edge and Skeleton [108.01007935498104]
In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction.
We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework.
arXiv Detail & Related papers (2020-04-18T11:10:11Z) - Latent Space Roadmap for Visual Action Planning of Deformable and Rigid
Object Manipulation [74.88956115580388]
Planning is performed in a low-dimensional latent state space that embeds images.
Our framework consists of two main components: a Visual Foresight Module (VFM) that generates a visual plan as a sequence of images, and an Action Proposal Network (APN) that predicts the actions between them.
arXiv Detail & Related papers (2020-03-19T18:43:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.