Related papers: HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models

HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models

URL: http://arxiv.org/abs/2312.06553v2
Date: Fri, 15 Mar 2024 17:24:30 GMT
Title: HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models
Authors: Xiaogang Peng, Yiming Xie, Zizhao Wu, Varun Jampani, Deqing Sun, Huaizu Jiang,
Abstract summary: We address the problem of generating realistic 3D human-object interactions (HOIs) driven by textual prompts. We first develop a dual-branch diffusion model (HOI-DM) to generate both human and object motions conditioned on the input text. We also develop an affordance prediction diffusion model (APDM) to predict the contacting area between the human and object.
Score: 42.62823339416957
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We address the problem of generating realistic 3D human-object interactions (HOIs) driven by textual prompts. To this end, we take a modular design and decompose the complex task into simpler sub-tasks. We first develop a dual-branch diffusion model (HOI-DM) to generate both human and object motions conditioned on the input text, and encourage coherent motions by a cross-attention communication module between the human and object motion generation branches. We also develop an affordance prediction diffusion model (APDM) to predict the contacting area between the human and object during the interactions driven by the textual prompt. The APDM is independent of the results by the HOI-DM and thus can correct potential errors by the latter. Moreover, it stochastically generates the contacting points to diversify the generated motions. Finally, we incorporate the estimated contacting points into the classifier-guidance to achieve accurate and close contact between humans and objects. To train and evaluate our approach, we annotate BEHAVE dataset with text descriptions. Experimental results on BEHAVE and OMOMO demonstrate that our approach produces realistic HOIs with various interactions and different types of objects.

Related papers

OOD-HOI: Text-Driven 3D Whole-Body Human-Object Interactions Generation Beyond Training Domains [66.62502882481373]
Current methods tend to focus either on the body or the hands, which limits their ability to produce cohesive and realistic interactions. We propose OOD-HOI, a text-driven framework for generating whole-body human-object interactions that generalize well to new objects and actions. Our approach integrates a dual-branch reciprocal diffusion model to synthesize initial interaction poses, a contact-guided interaction refiner to improve physical accuracy based on predicted contact areas, and a dynamic adaptation mechanism which includes semantic adjustment and geometry deformation to improve robustness.
arXiv Detail & Related papers (2024-11-27T10:13:35Z)
HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects [86.86284624825356]
HIMO is a dataset of full-body human interacting with multiple objects. HIMO contains 3.3K 4D HOI sequences and 4.08M 3D HOI frames.
arXiv Detail & Related papers (2024-07-17T07:47:34Z)
InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction [27.10256777126629]
This paper showcases the potential of generating human-object interactions without direct training on text-interaction pair data. We introduce a world model designed to comprehend simple physics, modeling how human actions influence object motion. By integrating these components, our novel framework, InterDreamer, is able to generate text-aligned 3D HOI sequences in a zero-shot manner.
arXiv Detail & Related papers (2024-03-28T17:59:30Z)
THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR) In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion. We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z)
Controllable Human-Object Interaction Synthesis [77.56877961681462]
We propose Controllable Human-Object Interaction Synthesis (CHOIS) to generate synchronized object motion and human motion in 3D scenes. Here, language descriptions inform style and intent, and waypoints, which can be effectively extracted from high-level planning, ground the motion in the scene. Our module seamlessly integrates with a path planning module, enabling the generation of long-term interactions in 3D environments.
arXiv Detail & Related papers (2023-12-06T21:14:20Z)
InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z)
InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion [29.25063155767897]
This paper addresses a novel task of anticipating 3D human-object interactions (HOIs) Our task is significantly more challenging, as it requires modeling dynamic objects with various shapes, capturing whole-body motion, and ensuring physically valid interactions. Experiments on multiple human-object interaction datasets demonstrate the effectiveness of our method for this task, capable of producing realistic, vivid, and remarkably long-term 3D HOI predictions.
arXiv Detail & Related papers (2023-08-31T17:59:08Z)
NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis [21.650091018774972]
We create a neural interaction field attached to a specific object, which outputs the distance to the valid interaction manifold given a human pose as input. This interaction field guides the sampling of an object-conditioned human motion diffusion model. We synthesize realistic motions for sitting and lifting with several objects, outperforming alternative approaches in terms of motion quality and successful action completion.
arXiv Detail & Related papers (2023-07-14T17:59:38Z)
TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions. We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data. We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.