DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects
- URL: http://arxiv.org/abs/2404.12524v1
- Date: Thu, 18 Apr 2024 21:55:23 GMT
- Title: DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects
- Authors: Dominik Bauer, Zhenjia Xu, Shuran Song,
- Abstract summary: Transformer-based architecture for planning interactions with elastoplastic objects.
DoughNet allows to plan robotic manipulation; selecting a suited tool, its pose and opening width to recreate robot- or human-made goals.
- Score: 27.194896819729113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Manipulation of elastoplastic objects like dough often involves topological changes such as splitting and merging. The ability to accurately predict these topological changes that a specific action might incur is critical for planning interactions with elastoplastic objects. We present DoughNet, a Transformer-based architecture for handling these challenges, consisting of two components. First, a denoising autoencoder represents deformable objects of varying topology as sets of latent codes. Second, a visual predictive model performs autoregressive set prediction to determine long-horizon geometrical deformation and topological changes purely in latent space. Given a partial initial state and desired manipulation trajectories, it infers all resulting object geometries and topologies at each step. DoughNet thereby allows to plan robotic manipulation; selecting a suited tool, its pose and opening width to recreate robot- or human-made goals. Our experiments in simulated and real environments show that DoughNet is able to significantly outperform related approaches that consider deformation only as geometrical change.
Related papers
- GeoFormer: A Multi-Polygon Segmentation Transformer [10.097953939411868]
In remote sensing there exists a common need for learning scale invariant shapes of objects like buildings.
We introduce the GeoFormer, a novel architecture which presents a remedy to the said challenges, learning to generate multipolygons end-to-end.
By modeling keypoints as spatially dependent tokens in an auto-regressive manner, the GeoFormer outperforms existing works in delineating building objects from satellite imagery.
arXiv Detail & Related papers (2024-11-25T17:54:44Z) - Physics-Encoded Graph Neural Networks for Deformation Prediction under
Contact [87.69278096528156]
In robotics, it's crucial to understand object deformation during tactile interactions.
We introduce a method using Physics-Encoded Graph Neural Networks (GNNs) for such predictions.
We've made our code and dataset public to advance research in robotic simulation and grasping.
arXiv Detail & Related papers (2024-02-05T19:21:52Z) - GAMMA: Generalizable Articulation Modeling and Manipulation for
Articulated Objects [53.965581080954905]
We propose a novel framework of Generalizable Articulation Modeling and Manipulating for Articulated Objects (GAMMA)
GAMMA learns both articulation modeling and grasp pose affordance from diverse articulated objects with different categories.
Results show that GAMMA significantly outperforms SOTA articulation modeling and manipulation algorithms in unseen and cross-category articulated objects.
arXiv Detail & Related papers (2023-09-28T08:57:14Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - ShapeShift: Superquadric-based Object Pose Estimation for Robotic
Grasping [85.38689479346276]
Current techniques heavily rely on a reference 3D object, limiting their generalizability and making it expensive to expand to new object categories.
This paper proposes ShapeShift, a superquadric-based framework for object pose estimation that predicts the object's pose relative to a primitive shape which is fitted to the object.
arXiv Detail & Related papers (2023-04-10T20:55:41Z) - Learning Robust Dynamics through Variational Sparse Gating [18.476155786474358]
In environments with many objects, often only a small number of them are moving or interacting at the same time.
In this paper, we investigate integrating this inductive bias of sparse interactions into the latent dynamics of world models trained from pixels.
arXiv Detail & Related papers (2022-10-21T02:56:51Z) - Generative Category-Level Shape and Pose Estimation with Semantic
Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image.
To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space.
We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z) - Sequential Topological Representations for Predictive Models of
Deformable Objects [18.190326379178995]
We construct compact topological representations to capture the state of highly deformable objects.
We develop an approach that tracks the evolution of this topological state through time.
Our experiments with highly deformable objects in simulation show that the proposed multistep predictive models yield more precise results than those obtained from computational topology libraries.
arXiv Detail & Related papers (2020-11-23T19:45:15Z) - Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module.
The image synthesis network is designed to efficiently span the pose configuration space.
We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z) - Predicting the Physical Dynamics of Unseen 3D Objects [65.49291702488436]
We focus on predicting the dynamics of 3D objects on a plane that have just been subjected to an impulsive force.
Our approach can generalize to object shapes and initial conditions that were unseen during training.
Our model can support training with data from both a physics engine or the real world.
arXiv Detail & Related papers (2020-01-16T06:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.