Related papers: Long-Horizon Manipulation of Unknown Objects via Task and Motion Planning with Estimated Affordances

Long-Horizon Manipulation of Unknown Objects via Task and Motion Planning with Estimated Affordances

URL: http://arxiv.org/abs/2108.04145v2
Date: Tue, 10 Aug 2021 04:01:30 GMT
Title: Long-Horizon Manipulation of Unknown Objects via Task and Motion Planning with Estimated Affordances
Authors: Aidan Curtis, Xiaolin Fang, Leslie Pack Kaelbling, Tom\'as Lozano-P\'erez, Caelan Reed Garrett
Abstract summary: We show that a task-and-motion planner can be used to plan intelligent behaviors even in the absence of a priori knowledge regarding the set of manipulable objects. We demonstrate that this strategy can enable a single system to perform a wide variety of real-world multi-step manipulation tasks.
Score: 26.082034134908785
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a strategy for designing and building very general robot manipulation systems involving the integration of a general-purpose task-and-motion planner with engineered and learned perception modules that estimate properties and affordances of unknown objects. Such systems are closed-loop policies that map from RGB images, depth images, and robot joint encoder measurements to robot joint position commands. We show that following this strategy a task-and-motion planner can be used to plan intelligent behaviors even in the absence of a priori knowledge regarding the set of manipulable objects, their geometries, and their affordances. We explore several different ways of implementing such perceptual modules for segmentation, property detection, shape estimation, and grasp generation. We show how these modules are integrated within the PDDLStream task and motion planning framework. Finally, we demonstrate that this strategy can enable a single system to perform a wide variety of real-world multi-step manipulation tasks, generalizing over a broad class of objects, object arrangements, and goals, without any prior knowledge of the environment and without re-training.

Related papers

Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies. Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors. We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z)
Representing Positional Information in Generative World Models for Object Manipulation [12.263162194821787]
We introduce a general approach that empowers world model-based agents to solve object-positioning tasks. In particular, LCP employs object-centric latent representations that explicitly capture object positional information for goal specification. Our methods are rigorously evaluated across several manipulation environments, showing favorable performance compared to current model-based control approaches.
arXiv Detail & Related papers (2024-09-18T14:19:50Z)
Embodied Instruction Following in Unknown Environments [66.60163202450954]
We propose an embodied instruction following (EIF) method for complex tasks in the unknown environment. We build a hierarchical embodied instruction following framework including the high-level task planner and the low-level exploration controller. For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues.
arXiv Detail & Related papers (2024-06-17T17:55:40Z)
Cognitive Planning for Object Goal Navigation using Generative AI Models [0.979851640406258]
We present a novel framework for solving the object goal navigation problem that generates efficient exploration strategies. Our approach enables a robot to navigate unfamiliar environments by leveraging Large Language Models (LLMs) and Large Vision-Language Models (LVLMs)
arXiv Detail & Related papers (2024-03-30T10:54:59Z)
Embodied Task Planning with Large Language Models [86.63533340293361]
We propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint. During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations. Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin.
arXiv Detail & Related papers (2023-07-04T17:58:25Z)
Online Grounding of PDDL Domains by Acting and Sensing in Unknown Environments [62.11612385360421]
This paper proposes a framework that allows an agent to perform different tasks. We integrate machine learning models to abstract the sensory data, symbolic planning for goal achievement and path planning for navigation. We evaluate the proposed method in accurate simulated environments, where the sensors are RGB-D on-board camera, GPS and compass.
arXiv Detail & Related papers (2021-12-18T21:48:20Z)
Enabling Visual Action Planning for Object Manipulation through Latent Space Roadmap [72.01609575400498]
We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces. We propose a Latent Space Roadmap (LSR) for task planning, a graph-based structure capturing globally the system dynamics in a low-dimensional latent space. We present a thorough investigation of our framework on two simulated box stacking tasks and a folding task executed on a real robot.
arXiv Detail & Related papers (2021-03-03T17:48:26Z)
A Long Horizon Planning Framework for Manipulating Rigid Pointcloud Objects [25.428781562909606]
We present a framework for solving long-horizon planning problems involving manipulation of rigid objects. Our method plans in the space of object subgoals and frees the planner from reasoning about robot-object interaction dynamics.
arXiv Detail & Related papers (2020-11-16T18:59:33Z)
Distributed Reinforcement Learning of Targeted Grasping with Active Vision for Mobile Manipulators [4.317864702902075]
We present the first RL-based system for a mobile manipulator that can (a) achieve targeted grasping generalizing to unseen target objects, (b) learn complex grasping strategies for cluttered scenes with occluded objects, and (c) perform active vision through its movable wrist camera to better locate objects. We train and evaluate our system in a simulated environment, identify key components for improving performance, analyze its behaviors, and transfer to a real-world setup.
arXiv Detail & Related papers (2020-07-16T02:47:48Z)
Dynamic Feature Integration for Simultaneous Detection of Salient Object, Edge and Skeleton [108.01007935498104]
In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction. We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework.
arXiv Detail & Related papers (2020-04-18T11:10:11Z)
Latent Space Roadmap for Visual Action Planning of Deformable and Rigid Object Manipulation [74.88956115580388]
Planning is performed in a low-dimensional latent state space that embeds images. Our framework consists of two main components: a Visual Foresight Module (VFM) that generates a visual plan as a sequence of images, and an Action Proposal Network (APN) that predicts the actions between them.
arXiv Detail & Related papers (2020-03-19T18:43:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.