Narrator: Towards Natural Control of Human-Scene Interaction Generation
via Relationship Reasoning
- URL: http://arxiv.org/abs/2303.09410v1
- Date: Thu, 16 Mar 2023 15:44:15 GMT
- Title: Narrator: Towards Natural Control of Human-Scene Interaction Generation
via Relationship Reasoning
- Authors: Haibiao Xuan, Xiongzheng Li, Jinsong Zhang, Hongwen Zhang, Yebin Liu
and Kun Li
- Abstract summary: We focus on naturally and controllably generating realistic and diverse HSIs from textual descriptions.
We propose Narrator, a novel relationship reasoning-based generative approach.
Our experiments and perceptual studies show that Narrator can controllably generate diverse interactions and significantly outperform existing works.
- Score: 34.00107506891627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Naturally controllable human-scene interaction (HSI) generation has an
important role in various fields, such as VR/AR content creation and
human-centered AI. However, existing methods are unnatural and unintuitive in
their controllability, which heavily limits their application in practice.
Therefore, we focus on a challenging task of naturally and controllably
generating realistic and diverse HSIs from textual descriptions. From human
cognition, the ideal generative model should correctly reason about spatial
relationships and interactive actions. To that end, we propose Narrator, a
novel relationship reasoning-based generative approach using a conditional
variation autoencoder for naturally controllable generation given a 3D scene
and a textual description. Also, we model global and local spatial
relationships in a 3D scene and a textual description respectively based on the
scene graph, and introduce a partlevel action mechanism to represent
interactions as atomic body part states. In particular, benefiting from our
relationship reasoning, we further propose a simple yet effective multi-human
generation strategy, which is the first exploration for controllable
multi-human scene interaction generation. Our extensive experiments and
perceptual studies show that Narrator can controllably generate diverse
interactions and significantly outperform existing works. The code and dataset
will be available for research purposes.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models.
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model.
To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z) - PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI [38.03745740636854]
PhyScene is a method dedicated to generating interactive 3D scenes characterized by realistic layouts, articulated objects, and rich physical interactivity tailored for embodied agents.
We demonstrate that PhyScene effectively leverages these guidance functions for physically interactable scene synthesis, outperforming existing state-of-the-art scene synthesis methods by a large margin.
arXiv Detail & Related papers (2024-04-15T05:29:23Z) - InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction [27.10256777126629]
This paper showcases the potential of generating human-object interactions without direct training on text-interaction pair data.
We introduce a world model designed to comprehend simple physics, modeling how human actions influence object motion.
By integrating these components, our novel framework, InterDreamer, is able to generate text-aligned 3D HOI sequences in a zero-shot manner.
arXiv Detail & Related papers (2024-03-28T17:59:30Z) - LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment [27.38638713080283]
We introduce LaserHuman, a pioneering dataset engineered to revolutionize Scene-Text-to-Motion research.
LaserHuman stands out with its inclusion of genuine human motions within 3D environments.
We propose a multi-conditional diffusion model, which is simple but effective, achieving state-of-the-art performance on existing datasets.
arXiv Detail & Related papers (2024-03-20T05:11:10Z) - Revisit Human-Scene Interaction via Space Occupancy [55.67657438543008]
Human-scene Interaction (HSI) generation is a challenging task and crucial for various downstream tasks.
In this work, we argue that interaction with a scene is essentially interacting with the space occupancy of the scene from an abstract physical perspective.
By treating pure motion sequences as records of humans interacting with invisible scene occupancy, we can aggregate motion-only data into a large-scale paired human-occupancy interaction database.
arXiv Detail & Related papers (2023-12-05T12:03:00Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Synthesizing Physical Character-Scene Interactions [64.26035523518846]
It is necessary to synthesize such interactions between virtual characters and their surroundings.
We present a system that uses adversarial imitation learning and reinforcement learning to train physically-simulated characters.
Our approach takes physics-based character motion generation a step closer to broad applicability.
arXiv Detail & Related papers (2023-02-02T05:21:32Z) - Compositional Human-Scene Interaction Synthesis with Semantic Control [16.93177243590465]
We aim to synthesize humans interacting with a given 3D scene controlled by high-level semantic specifications.
We design a novel transformer-based generative model, in which the articulated 3D human body surface points and 3D objects are jointly encoded.
Inspired by the compositional nature of interactions that humans can simultaneously interact with multiple objects, we define interaction semantics as the composition of varying numbers of atomic action-object pairs.
arXiv Detail & Related papers (2022-07-26T11:37:44Z) - iGibson, a Simulation Environment for Interactive Tasks in Large
Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes.
Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects.
iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.