Enabling Visual Action Planning for Object Manipulation through Latent
Space Roadmap
- URL: http://arxiv.org/abs/2103.02554v1
- Date: Wed, 3 Mar 2021 17:48:26 GMT
- Title: Enabling Visual Action Planning for Object Manipulation through Latent
Space Roadmap
- Authors: Martina Lippi, Petra Poklukar, Michael C. Welle, Anastasiia Varava,
Hang Yin, Alessandro Marino, Danica Kragic
- Abstract summary: We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces.
We propose a Latent Space Roadmap (LSR) for task planning, a graph-based structure capturing globally the system dynamics in a low-dimensional latent space.
We present a thorough investigation of our framework on two simulated box stacking tasks and a folding task executed on a real robot.
- Score: 72.01609575400498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a framework for visual action planning of complex manipulation
tasks with high-dimensional state spaces, focusing on manipulation of
deformable objects. We propose a Latent Space Roadmap (LSR) for task planning,
a graph-based structure capturing globally the system dynamics in a
low-dimensional latent space. Our framework consists of three parts: (1) a
Mapping Module (MM) that maps observations, given in the form of images, into a
structured latent space extracting the respective states, that generates
observations from the latent states, (2) the LSR which builds and connects
clusters containing similar states in order to find the latent plans between
start and goal states extracted by MM, and (3) the Action Proposal Module that
complements the latent plan found by the LSR with the corresponding actions. We
present a thorough investigation of our framework on two simulated box stacking
tasks and a folding task executed on a real robot.
Related papers
- LISNeRF Mapping: LiDAR-based Implicit Mapping via Semantic Neural Fields for Large-Scale 3D Scenes [2.822816116516042]
Large-scale semantic mapping is crucial for outdoor autonomous agents to fulfill high-level tasks such as planning and navigation.
This paper proposes a novel method for large-scale 3D semantic reconstruction through implicit representations from posed LiDAR measurements alone.
arXiv Detail & Related papers (2023-11-04T03:55:38Z) - Compositional Foundation Models for Hierarchical Planning [52.18904315515153]
We propose a foundation model which leverages expert foundation model trained on language, vision and action data individually together to solve long-horizon tasks.
We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model.
Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos.
arXiv Detail & Related papers (2023-09-15T17:44:05Z) - PlaneRecTR++: Unified Query Learning for Joint 3D Planar Reconstruction and Pose Estimation [10.982464344805194]
PlaneRecTR++ is a Transformer-based architecture that unifies all sub-tasks related to multi-view reconstruction and pose estimation.
Our proposed unified learning achieves mutual benefits across sub-tasks, obtaining a new state-of-the-art performance on public ScanNetv1, ScanNetv2, NYUv2-Plane, and MatterPort3D datasets.
arXiv Detail & Related papers (2023-07-25T18:28:19Z) - Embodied Task Planning with Large Language Models [86.63533340293361]
We propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint.
During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations.
Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin.
arXiv Detail & Related papers (2023-07-04T17:58:25Z) - Long-Horizon Manipulation of Unknown Objects via Task and Motion
Planning with Estimated Affordances [26.082034134908785]
We show that a task-and-motion planner can be used to plan intelligent behaviors even in the absence of a priori knowledge regarding the set of manipulable objects.
We demonstrate that this strategy can enable a single system to perform a wide variety of real-world multi-step manipulation tasks.
arXiv Detail & Related papers (2021-08-09T16:13:47Z) - Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model
Alignments [81.38641691636847]
We rethink the problem of scene reconstruction from an embodied agent's perspective.
We reconstruct an interactive scene using RGB-D data stream.
This reconstructed scene replaces the object meshes in the dense panoptic map with part-based articulated CAD models.
arXiv Detail & Related papers (2021-03-30T05:56:58Z) - Plan2Vec: Unsupervised Representation Learning by Latent Plans [106.37274654231659]
We introduce plan2vec, an unsupervised representation learning approach that is inspired by reinforcement learning.
Plan2vec constructs a weighted graph on an image dataset using near-neighbor distances, and then extrapolates this local metric to a global embedding by distilling path-integral over planned path.
We demonstrate the effectiveness of plan2vec on one simulated and two challenging real-world image datasets.
arXiv Detail & Related papers (2020-05-07T17:52:23Z) - Latent Space Roadmap for Visual Action Planning of Deformable and Rigid
Object Manipulation [74.88956115580388]
Planning is performed in a low-dimensional latent state space that embeds images.
Our framework consists of two main components: a Visual Foresight Module (VFM) that generates a visual plan as a sequence of images, and an Action Proposal Network (APN) that predicts the actions between them.
arXiv Detail & Related papers (2020-03-19T18:43:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.