Controllable Human-Object Interaction Synthesis
- URL: http://arxiv.org/abs/2312.03913v2
- Date: Sun, 14 Jul 2024 23:00:54 GMT
- Title: Controllable Human-Object Interaction Synthesis
- Authors: Jiaman Li, Alexander Clegg, Roozbeh Mottaghi, Jiajun Wu, Xavier Puig, C. Karen Liu,
- Abstract summary: We propose Controllable Human-Object Interaction Synthesis (CHOIS) to generate synchronized object motion and human motion in 3D scenes.
Here, language descriptions inform style and intent, and waypoints, which can be effectively extracted from high-level planning, ground the motion in the scene.
Our module seamlessly integrates with a path planning module, enabling the generation of long-term interactions in 3D environments.
- Score: 77.56877961681462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthesizing semantic-aware, long-horizon, human-object interaction is critical to simulate realistic human behaviors. In this work, we address the challenging problem of generating synchronized object motion and human motion guided by language descriptions in 3D scenes. We propose Controllable Human-Object Interaction Synthesis (CHOIS), an approach that generates object motion and human motion simultaneously using a conditional diffusion model given a language description, initial object and human states, and sparse object waypoints. Here, language descriptions inform style and intent, and waypoints, which can be effectively extracted from high-level planning, ground the motion in the scene. Naively applying a diffusion model fails to predict object motion aligned with the input waypoints; it also cannot ensure the realism of interactions that require precise hand-object and human-floor contact. To overcome these problems, we introduce an object geometry loss as additional supervision to improve the matching between generated object motion and input object waypoints; we also design guidance terms to enforce contact constraints during the sampling process of the trained diffusion model. We demonstrate that our learned interaction module can synthesize realistic human-object interactions, adhering to provided textual descriptions and sparse waypoint conditions. Additionally, our module seamlessly integrates with a path planning module, enabling the generation of long-term interactions in 3D environments.
Related papers
- THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR)
In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion.
We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z) - HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models [42.62823339416957]
We address the problem of generating realistic 3D human-object interactions (HOIs) driven by textual prompts.
We first develop a dual-branch diffusion model (HOI-DM) to generate both human and object motions conditioned on the input text.
We also develop an affordance prediction diffusion model (APDM) to predict the contacting area between the human and object.
arXiv Detail & Related papers (2023-12-11T17:41:17Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Object Motion Guided Human Motion Synthesis [22.08240141115053]
We study the problem of full-body human motion synthesis for the manipulation of large-sized objects.
We propose Object MOtion guided human MOtion synthesis (OMOMO), a conditional diffusion framework.
We develop a novel system that captures full-body human manipulation motions by simply attaching a smartphone to the object being manipulated.
arXiv Detail & Related papers (2023-09-28T08:22:00Z) - InterDiff: Generating 3D Human-Object Interactions with Physics-Informed
Diffusion [29.25063155767897]
This paper addresses a novel task of anticipating 3D human-object interactions (HOIs)
Our task is significantly more challenging, as it requires modeling dynamic objects with various shapes, capturing whole-body motion, and ensuring physically valid interactions.
Experiments on multiple human-object interaction datasets demonstrate the effectiveness of our method for this task, capable of producing realistic, vivid, and remarkably long-term 3D HOI predictions.
arXiv Detail & Related papers (2023-08-31T17:59:08Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - NIFTY: Neural Object Interaction Fields for Guided Human Motion
Synthesis [21.650091018774972]
We create a neural interaction field attached to a specific object, which outputs the distance to the valid interaction manifold given a human pose as input.
This interaction field guides the sampling of an object-conditioned human motion diffusion model.
We synthesize realistic motions for sitting and lifting with several objects, outperforming alternative approaches in terms of motion quality and successful action completion.
arXiv Detail & Related papers (2023-07-14T17:59:38Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - IMoS: Intent-Driven Full-Body Motion Synthesis for Human-Object
Interactions [69.95820880360345]
We present the first framework to synthesize the full-body motion of virtual human characters with 3D objects placed within their reach.
Our system takes as input textual instructions specifying the objects and the associated intentions of the virtual characters.
We show that our synthesized full-body motions appear more realistic to the participants in more than 80% of scenarios.
arXiv Detail & Related papers (2022-12-14T23:59:24Z) - Compositional Human-Scene Interaction Synthesis with Semantic Control [16.93177243590465]
We aim to synthesize humans interacting with a given 3D scene controlled by high-level semantic specifications.
We design a novel transformer-based generative model, in which the articulated 3D human body surface points and 3D objects are jointly encoded.
Inspired by the compositional nature of interactions that humans can simultaneously interact with multiple objects, we define interaction semantics as the composition of varying numbers of atomic action-object pairs.
arXiv Detail & Related papers (2022-07-26T11:37:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.