Affordance Diffusion: Synthesizing Hand-Object Interactions
- URL: http://arxiv.org/abs/2303.12538v3
- Date: Sat, 20 May 2023 22:12:01 GMT
- Title: Affordance Diffusion: Synthesizing Hand-Object Interactions
- Authors: Yufei Ye, Xueting Li, Abhinav Gupta, Shalini De Mello, Stan
Birchfield, Jiaming Song, Shubham Tulsiani, Sifei Liu
- Abstract summary: Given an RGB image of an object, we aim to hallucinate plausible images of a human hand interacting with it.
We propose a two-step generative approach: a LayoutNet that samples an articulation-agnostic hand-object-interaction layout, and a ContentNet that synthesizes images of a hand grasping the object.
- Score: 81.98499943996394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent successes in image synthesis are powered by large-scale diffusion
models. However, most methods are currently limited to either text- or
image-conditioned generation for synthesizing an entire image, texture transfer
or inserting objects into a user-specified region. In contrast, in this work we
focus on synthesizing complex interactions (ie, an articulated hand) with a
given object. Given an RGB image of an object, we aim to hallucinate plausible
images of a human hand interacting with it. We propose a two-step generative
approach: a LayoutNet that samples an articulation-agnostic
hand-object-interaction layout, and a ContentNet that synthesizes images of a
hand grasping the object given the predicted layout. Both are built on top of a
large-scale pretrained diffusion model to make use of its latent
representation. Compared to baselines, the proposed method is shown to
generalize better to novel objects and perform surprisingly well on
out-of-distribution in-the-wild scenes of portable-sized objects. The resulting
system allows us to predict descriptive affordance information, such as hand
articulation and approaching orientation. Project page:
https://judyye.github.io/affordiffusion-www
Related papers
- GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction [9.564223516111275]
Recent generative models can synthesize high-quality images but often fail to generate humans interacting with objects using their hands.
In this paper, we propose GraspDiffusion, a novel generative method that creates realistic scenes of human-object interaction.
arXiv Detail & Related papers (2024-10-17T01:45:42Z) - G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis [57.07638884476174]
G-HOP is a denoising diffusion based generative prior for hand-object interactions.
We represent the human hand via a skeletal distance field to obtain a representation aligned with the signed distance field for the object.
We show that this hand-object prior can then serve as generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis.
arXiv Detail & Related papers (2024-04-18T17:59:28Z) - HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data [42.49031063635004]
We propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data.
Our model is a conditional diffusion model that takes both the 3D hand-object geometric structure and text description as inputs for image synthesis.
We adopt the generated 3D data for learning 6D object pose estimation and show its effectiveness in improving perception systems.
arXiv Detail & Related papers (2024-03-18T17:48:31Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Hand-Object Interaction Image Generation [135.87707468156057]
This work is dedicated to a new task, i.e., hand-object interaction image generation.
It aims to conditionally generate the hand-object image under the given hand, object and their interaction status.
This task is challenging and research-worthy in many potential application scenarios, such as AR/VR games and online shopping.
arXiv Detail & Related papers (2022-11-28T18:59:57Z) - Interacting Hand-Object Pose Estimation via Dense Mutual Attention [97.26400229871888]
3D hand-object pose estimation is the key to the success of many computer vision applications.
We propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object.
Our method is able to produce physically plausible poses with high quality and real-time inference speed.
arXiv Detail & Related papers (2022-11-16T10:01:33Z) - Object-Compositional Neural Implicit Surfaces [45.274466719163925]
The neural implicit representation has shown its effectiveness in novel view synthesis and high-quality 3D reconstruction from multi-view images.
This paper proposes a novel framework, ObjectSDF, to build an object-compositional neural implicit representation with high fidelity in 3D reconstruction and object representation.
arXiv Detail & Related papers (2022-07-20T06:38:04Z) - Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis.
We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.