Related papers: Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks

Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks

URL: http://arxiv.org/abs/2012.03385v4
Date: Sun, 18 Jun 2023 18:54:13 GMT
Title: Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks
Authors: Daniel Seita, Pete Florence, Jonathan Tompson, Erwin Coumans, Vikas Sindhwani, Ken Goldberg, Andy Zeng
Abstract summary: Rearranging and manipulating deformable objects such as cables, fabrics, and bags is a long-standing challenge in robotic manipulation. We develop a suite of simulated benchmarks with 1D, 2D, and 3D deformable structures. We propose embedding goal-conditioning into Transporter Networks, a recently proposed model architecture for learning robotic manipulation.
Score: 36.90218756798642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Rearranging and manipulating deformable objects such as cables, fabrics, and bags is a long-standing challenge in robotic manipulation. The complex dynamics and high-dimensional configuration spaces of deformables, compared to rigid objects, make manipulation difficult not only for multi-step planning, but even for goal specification. Goals cannot be as easily specified as rigid object poses, and may involve complex relative spatial relations such as "place the item inside the bag". In this work, we develop a suite of simulated benchmarks with 1D, 2D, and 3D deformable structures, including tasks that involve image-based goal-conditioning and multi-step deformable manipulation. We propose embedding goal-conditioning into Transporter Networks, a recently proposed model architecture for learning robotic manipulation that rearranges deep features to infer displacements that can represent pick and place actions. In simulation and in physical experiments, we demonstrate that goal-conditioned Transporter Networks enable agents to manipulate deformable structures into flexibly specified configurations without test-time visual anchors for target locations. We also significantly extend prior results using Transporter Networks for manipulating deformable objects by testing on tasks with 2D and 3D deformables. Supplementary material is available at https://berkeleyautomation.github.io/bags/.

Related papers

Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling [48.78204955169967]
Articulate Anymesh is an automated framework that is able to convert rigid 3D mesh into its articulated counterpart in an open-vocabulary manner. Our experiments show that Articulate Anymesh can generate large-scale, high-quality 3D articulated objects, including tools, toys, mechanical devices, and vehicles.
arXiv Detail & Related papers (2025-02-04T18:59:55Z)
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning. voxelization infers per-object occupancy probabilities at individual spatial locations. Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z)
DeformerNet: Learning Bimanual Manipulation of 3D Deformable Objects [13.138509669247508]
Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the manipulated object and a point cloud of the goal shape. This shape embedding enables the robot to learn a visual servo controller that computes the desired robot end-effector action to
arXiv Detail & Related papers (2023-05-08T04:08:06Z)
Deep Reinforcement Learning Based on Local GNN for Goal-conditioned Deformable Object Rearranging [1.807492010338763]
Object rearranging is one of the most common deformable manipulation tasks, where the robot needs to rearrange a deformable object into a goal configuration. Previous studies focus on designing an expert system for each specific task by model-based or data-driven approaches. We design a local GNN (Graph Neural Network) based learning method, which utilizes two representation graphs to encode keypoints detected from images. Our framework is effective in multiple 1-D (rope, rope ring) and 2-D (cloth) rearranging tasks in simulation and can be easily transferred to a real robot by fine-tuning a keypoint detector
arXiv Detail & Related papers (2023-02-21T05:21:26Z)
Graph-Transporter: A Graph-based Learning Method for Goal-Conditioned Deformable Object Rearranging Task [1.807492010338763]
We present a novel framework, Graph-Transporter, for goal-conditioned deformable object rearranging tasks. Our framework adopts an architecture based on Fully Convolutional Network (FCN) to output pixel-wise pick-and-place actions from only visual input.
arXiv Detail & Related papers (2023-02-21T05:21:04Z)
Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation [64.00292856805865]
We propose PlAnning with Spatial-Temporal Abstraction (PASTA), which incorporates both spatial abstraction and temporal abstraction. Our framework maps high-dimension 3D observations into a set of latent vectors and plans over skill sequences on top of the latent set representation. We show that our method can effectively perform challenging deformable object manipulation tasks in the real world.
arXiv Detail & Related papers (2022-10-27T19:57:04Z)
Learning Visual Shape Control of Novel 3D Deformable Objects from Partial-View Point Clouds [7.1659268120093635]
Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the object being manipulated and a point cloud of the goal shape to learn a low-dimensional representation of the object shape.
arXiv Detail & Related papers (2021-10-10T02:34:57Z)
Object Wake-up: 3-D Object Reconstruction, Animation, and in-situ Rendering from a Single Image [58.69732754597448]
Given a picture of a chair, could we extract the 3-D shape of the chair, animate its plausible articulations and motions, and render in-situ in its original image space? We devise an automated approach to extract and manipulate articulated objects in single images.
arXiv Detail & Related papers (2021-08-05T16:20:12Z)
Where2Act: From Pixels to Actions for Articulated 3D Objects [54.19638599501286]
We extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts. We propose a learning-from-interaction framework with an online data sampling strategy that allows us to train the network in simulation. Our learned models even transfer to real-world data.
arXiv Detail & Related papers (2021-01-07T18:56:38Z)
A Long Horizon Planning Framework for Manipulating Rigid Pointcloud Objects [25.428781562909606]
We present a framework for solving long-horizon planning problems involving manipulation of rigid objects. Our method plans in the space of object subgoals and frees the planner from reasoning about robot-object interaction dynamics.
arXiv Detail & Related papers (2020-11-16T18:59:33Z)
Latent Space Roadmap for Visual Action Planning of Deformable and Rigid Object Manipulation [74.88956115580388]
Planning is performed in a low-dimensional latent state space that embeds images. Our framework consists of two main components: a Visual Foresight Module (VFM) that generates a visual plan as a sequence of images, and an Action Proposal Network (APN) that predicts the actions between them.
arXiv Detail & Related papers (2020-03-19T18:43:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.