ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object
Manipulation
- URL: http://arxiv.org/abs/2203.06856v1
- Date: Mon, 14 Mar 2022 04:56:55 GMT
- Title: ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object
Manipulation
- Authors: Bokui Shen, Zhenyu Jiang, Christopher Choy, Leonidas J. Guibas, Silvio
Savarese, Anima Anandkumar and Yuke Zhu
- Abstract summary: We introduce ACID, an action-conditional visual dynamics model for volumetric deformable objects.
A benchmark contains over 17,000 action trajectories with six types of plush toys and 78 variants.
Our model achieves the best performance in geometry, correspondence, and dynamics predictions.
- Score: 135.10594078615952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Manipulating volumetric deformable objects in the real world, like plush toys
and pizza dough, bring substantial challenges due to infinite shape variations,
non-rigid motions, and partial observability. We introduce ACID, an
action-conditional visual dynamics model for volumetric deformable objects
based on structured implicit neural representations. ACID integrates two new
techniques: implicit representations for action-conditional dynamics and
geodesics-based contrastive learning. To represent deformable dynamics from
partial RGB-D observations, we learn implicit representations of occupancy and
flow-based forward dynamics. To accurately identify state change under large
non-rigid deformations, we learn a correspondence embedding field through a
novel geodesics-based contrastive loss. To evaluate our approach, we develop a
simulation framework for manipulating complex deformable shapes in realistic
scenes and a benchmark containing over 17,000 action trajectories with six
types of plush toys and 78 variants. Our model achieves the best performance in
geometry, correspondence, and dynamics predictions over existing approaches.
The ACID dynamics models are successfully employed to goal-conditioned
deformable manipulation tasks, resulting in a 30% increase in task success rate
over the strongest baseline. For more results and information, please visit
https://b0ku1.github.io/acid-web/ .
Related papers
- Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation [52.36691633451968]
ViTaM-D is a visual-tactile framework for dynamic hand-object interaction reconstruction.
DF-Field is a distributed force-aware contact representation model.
Our results highlight the superior performance of ViTaM-D in both rigid and deformable object reconstruction.
arXiv Detail & Related papers (2024-11-14T16:29:45Z) - MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.
By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.
We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments [0.0]
We propose DENSER, a framework that significantly enhances the representation of dynamic objects.
The proposed approach significantly outperforms state-of-the-art methods by a wide margin.
arXiv Detail & Related papers (2024-09-16T07:11:58Z) - Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.
voxelization infers per-object occupancy probabilities at individual spatial locations.
Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z) - Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Learning visual-based deformable object rearrangement with local graph
neural networks [4.333220038316982]
We propose a novel representation strategy that can efficiently model the deformable object states with a set of keypoints and their interactions.
We also propose a light local GNN learning to jointly model the deformable rearrangement dynamics and infer the optimal manipulation actions.
Our method reaches much higher success rates on a variety of deformable rearrangement tasks (96.3% on average) than state-of-the-art method in simulation experiments.
arXiv Detail & Related papers (2023-10-16T11:42:54Z) - AGAR: Attention Graph-RNN for Adaptative Motion Prediction of Point
Clouds of Deformable Objects [7.414594429329531]
We propose an improved architecture for point cloud prediction of deformable 3D objects.
Specifically, to handle deformable shapes, we propose a graph-based approach that learns and exploits the spatial structure of point clouds.
The proposed adaptative module controls the composition of local and global motions for each point, enabling the network to model complex motions in deformable 3D objects more effectively.
arXiv Detail & Related papers (2023-07-19T12:21:39Z) - Dynamic-Resolution Model Learning for Object Pile Manipulation [33.05246884209322]
We investigate how to learn dynamic and adaptive representations at different levels of abstraction to achieve the optimal trade-off between efficiency and effectiveness.
Specifically, we construct dynamic-resolution particle representations of the environment and learn a unified dynamics model using graph neural networks (GNNs)
We show that our method achieves significantly better performance than state-of-the-art fixed-resolution baselines at the gathering, sorting, and redistribution of granular object piles.
arXiv Detail & Related papers (2023-06-29T05:51:44Z) - SoftSMPL: Data-driven Modeling of Nonlinear Soft-tissue Dynamics for
Parametric Humans [15.83525220631304]
We present SoftSMPL, a learning-based method to model realistic soft-tissue dynamics as a function of body shape and motion.
At the core of our method there are three key contributions that enable us to model highly realistic dynamics.
arXiv Detail & Related papers (2020-04-01T10:35:06Z) - Learning Predictive Representations for Deformable Objects Using
Contrastive Estimation [83.16948429592621]
We propose a new learning framework that jointly optimize both the visual representation model and the dynamics model.
We show substantial improvements over standard model-based learning techniques across our rope and cloth manipulation suite.
arXiv Detail & Related papers (2020-03-11T17:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.