SCENIC: Scene-aware Semantic Navigation with Instruction-guided Control
- URL: http://arxiv.org/abs/2412.15664v1
- Date: Fri, 20 Dec 2024 08:25:15 GMT
- Title: SCENIC: Scene-aware Semantic Navigation with Instruction-guided Control
- Authors: Xiaohan Zhang, Sebastian Starke, Vladimir Guzov, Zhensong Zhang, Eduardo PĂ©rez Pellitero, Gerard Pons-Moll,
- Abstract summary: SCENIC is a diffusion model designed to generate human motion that adapts to dynamic terrains within virtual scenes.
Our system achieves seamless transitions between different motion styles while maintaining scene constraints.
Our code, dataset, and models will be released at urlhttps://virtualhumans.mpi-inf.mpg.de/scenic/.
- Score: 36.22743674288336
- License:
- Abstract: Synthesizing natural human motion that adapts to complex environments while allowing creative control remains a fundamental challenge in motion synthesis. Existing models often fall short, either by assuming flat terrain or lacking the ability to control motion semantics through text. To address these limitations, we introduce SCENIC, a diffusion model designed to generate human motion that adapts to dynamic terrains within virtual scenes while enabling semantic control through natural language. The key technical challenge lies in simultaneously reasoning about complex scene geometry while maintaining text control. This requires understanding both high-level navigation goals and fine-grained environmental constraints. The model must ensure physical plausibility and precise navigation across varied terrain, while also preserving user-specified text control, such as ``carefully stepping over obstacles" or ``walking upstairs like a zombie." Our solution introduces a hierarchical scene reasoning approach. At its core is a novel scene-dependent, goal-centric canonicalization that handles high-level goal constraint, and is complemented by an ego-centric distance field that captures local geometric details. This dual representation enables our model to generate physically plausible motion across diverse 3D scenes. By implementing frame-wise text alignment, our system achieves seamless transitions between different motion styles while maintaining scene constraints. Experiments demonstrate our novel diffusion model generates arbitrarily long human motions that both adapt to complex scenes with varying terrain surfaces and respond to textual prompts. Additionally, we show SCENIC can generalize to four real-scene datasets. Our code, dataset, and models will be released at \url{https://virtualhumans.mpi-inf.mpg.de/scenic/}.
Related papers
- PlaMo: Plan and Move in Rich 3D Physical Environments [68.75982381673869]
We present PlaMo, a scene-aware path planner and a robust physics-based controller.
The planner produces a sequence of motion paths, considering the various limitations the scene imposes on the motion.
Our control policy generates rich and realistic physical motion adhering to the plan.
arXiv Detail & Related papers (2024-06-26T10:41:07Z) - Physics-based Scene Layout Generation from Human Motion [21.939444709132395]
We present a physics-based approach that simultaneously optimize a scene layout generator and simulates a moving human in a physics simulator.
We use reinforcement learning to perform a dual-optimization of both the character motion imitation controller and the scene layout generator.
We evaluate our method using motions from SAMP and PROX, and demonstrate physically plausible scene layout reconstruction compared with the previous kinematics-based method.
arXiv Detail & Related papers (2024-05-21T02:36:37Z) - Generating Human Motion in 3D Scenes from Text Descriptions [60.04976442328767]
This paper focuses on the task of generating human motions in 3D indoor scenes given text descriptions of the human-scene interactions.
We propose a new approach that decomposes the complex problem into two more manageable sub-problems.
For language grounding of the target object, we leverage the power of large language models; for motion generation, we design an object-centric scene representation.
arXiv Detail & Related papers (2024-05-13T14:30:12Z) - Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models.
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model.
To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z) - TC4D: Trajectory-Conditioned Text-to-4D Generation [94.90700997568158]
We propose TC4D: trajectory-conditioned text-to-4D generation, which factors motion into global and local components.
We learn local deformations that conform to the global trajectory using supervision from a text-to-video model.
Our approach enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements to the realism and amount of generated motion.
arXiv Detail & Related papers (2024-03-26T17:55:11Z) - Style-Consistent 3D Indoor Scene Synthesis with Decoupled Objects [84.45345829270626]
Controllable 3D indoor scene synthesis stands at the forefront of technological progress.
Current methods for scene stylization are limited to applying styles to the entire scene.
We introduce a unique pipeline designed for synthesis 3D indoor scenes.
arXiv Detail & Related papers (2024-01-24T03:10:36Z) - Revisit Human-Scene Interaction via Space Occupancy [55.67657438543008]
Human-scene Interaction (HSI) generation is a challenging task and crucial for various downstream tasks.
In this work, we argue that interaction with a scene is essentially interacting with the space occupancy of the scene from an abstract physical perspective.
By treating pure motion sequences as records of humans interacting with invisible scene occupancy, we can aggregate motion-only data into a large-scale paired human-occupancy interaction database.
arXiv Detail & Related papers (2023-12-05T12:03:00Z) - Story-to-Motion: Synthesizing Infinite and Controllable Character
Animation from Long Text [14.473103773197838]
A new task, Story-to-Motion, arises when characters are required to perform specific motions based on a long text description.
Previous works in character control and text-to-motion have addressed related aspects, yet a comprehensive solution remains elusive.
We propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text.
arXiv Detail & Related papers (2023-11-13T16:22:38Z) - Synthesizing Physically Plausible Human Motions in 3D Scenes [41.1310197485928]
We present a framework that enables physically simulated characters to perform long-term interaction tasks in diverse, cluttered, and unseen scenes.
Specifically, InterCon contains two complementary policies that enable characters to enter and leave the interacting state.
To generate interaction with objects at different places, we further design NavCon, a trajectory following policy, to keep characters' motions in the free space of 3D scenes.
arXiv Detail & Related papers (2023-08-17T15:17:49Z) - Synthesizing Diverse Human Motions in 3D Indoor Scenes [16.948649870341782]
We present a novel method for populating 3D indoor scenes with virtual humans that can navigate in the environment and interact with objects in a realistic manner.
Existing approaches rely on training sequences that contain captured human motions and the 3D scenes they interact with.
We propose a reinforcement learning-based approach that enables virtual humans to navigate in 3D scenes and interact with objects realistically and autonomously.
arXiv Detail & Related papers (2023-05-21T09:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.