ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object
Manipulation
- URL: http://arxiv.org/abs/2203.06856v1
- Date: Mon, 14 Mar 2022 04:56:55 GMT
- Title: ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object
Manipulation
- Authors: Bokui Shen, Zhenyu Jiang, Christopher Choy, Leonidas J. Guibas, Silvio
Savarese, Anima Anandkumar and Yuke Zhu
- Abstract summary: We introduce ACID, an action-conditional visual dynamics model for volumetric deformable objects.
A benchmark contains over 17,000 action trajectories with six types of plush toys and 78 variants.
Our model achieves the best performance in geometry, correspondence, and dynamics predictions.
- Score: 135.10594078615952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Manipulating volumetric deformable objects in the real world, like plush toys
and pizza dough, bring substantial challenges due to infinite shape variations,
non-rigid motions, and partial observability. We introduce ACID, an
action-conditional visual dynamics model for volumetric deformable objects
based on structured implicit neural representations. ACID integrates two new
techniques: implicit representations for action-conditional dynamics and
geodesics-based contrastive learning. To represent deformable dynamics from
partial RGB-D observations, we learn implicit representations of occupancy and
flow-based forward dynamics. To accurately identify state change under large
non-rigid deformations, we learn a correspondence embedding field through a
novel geodesics-based contrastive loss. To evaluate our approach, we develop a
simulation framework for manipulating complex deformable shapes in realistic
scenes and a benchmark containing over 17,000 action trajectories with six
types of plush toys and 78 variants. Our model achieves the best performance in
geometry, correspondence, and dynamics predictions over existing approaches.
The ACID dynamics models are successfully employed to goal-conditioned
deformable manipulation tasks, resulting in a 30% increase in task success rate
over the strongest baseline. For more results and information, please visit
https://b0ku1.github.io/acid-web/ .
Related papers
- Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Learning visual-based deformable object rearrangement with local graph
neural networks [4.333220038316982]
We propose a novel representation strategy that can efficiently model the deformable object states with a set of keypoints and their interactions.
We also propose a light local GNN learning to jointly model the deformable rearrangement dynamics and infer the optimal manipulation actions.
Our method reaches much higher success rates on a variety of deformable rearrangement tasks (96.3% on average) than state-of-the-art method in simulation experiments.
arXiv Detail & Related papers (2023-10-16T11:42:54Z) - AGAR: Attention Graph-RNN for Adaptative Motion Prediction of Point
Clouds of Deformable Objects [7.414594429329531]
We propose an improved architecture for point cloud prediction of deformable 3D objects.
Specifically, to handle deformable shapes, we propose a graph-based approach that learns and exploits the spatial structure of point clouds.
The proposed adaptative module controls the composition of local and global motions for each point, enabling the network to model complex motions in deformable 3D objects more effectively.
arXiv Detail & Related papers (2023-07-19T12:21:39Z) - Dynamic-Resolution Model Learning for Object Pile Manipulation [33.05246884209322]
We investigate how to learn dynamic and adaptive representations at different levels of abstraction to achieve the optimal trade-off between efficiency and effectiveness.
Specifically, we construct dynamic-resolution particle representations of the environment and learn a unified dynamics model using graph neural networks (GNNs)
We show that our method achieves significantly better performance than state-of-the-art fixed-resolution baselines at the gathering, sorting, and redistribution of granular object piles.
arXiv Detail & Related papers (2023-06-29T05:51:44Z) - DexDeform: Dexterous Deformable Object Manipulation with Human
Demonstrations and Differentiable Physics [97.75188532559952]
We propose a principled framework that abstracts dexterous manipulation skills from human demonstration.
We then train a skill model using demonstrations for planning over action abstractions in imagination.
To evaluate the effectiveness of our approach, we introduce a suite of six challenging dexterous deformable object manipulation tasks.
arXiv Detail & Related papers (2023-03-27T17:59:49Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - Spatial-Temporal Alignment Network for Action Recognition and Detection [80.19235282200697]
This paper studies how to introduce viewpoint-invariant feature representations that can help action recognition and detection.
We propose a novel Spatial-Temporal Alignment Network (STAN) that aims to learn geometric invariant representations for action recognition and action detection.
We test our STAN model extensively on AVA, Kinetics-400, AVA-Kinetics, Charades, and Charades-Ego datasets.
arXiv Detail & Related papers (2020-12-04T06:23:40Z) - SoftSMPL: Data-driven Modeling of Nonlinear Soft-tissue Dynamics for
Parametric Humans [15.83525220631304]
We present SoftSMPL, a learning-based method to model realistic soft-tissue dynamics as a function of body shape and motion.
At the core of our method there are three key contributions that enable us to model highly realistic dynamics.
arXiv Detail & Related papers (2020-04-01T10:35:06Z) - Learning Predictive Representations for Deformable Objects Using
Contrastive Estimation [83.16948429592621]
We propose a new learning framework that jointly optimize both the visual representation model and the dynamics model.
We show substantial improvements over standard model-based learning techniques across our rope and cloth manipulation suite.
arXiv Detail & Related papers (2020-03-11T17:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.