Uni-Inter: Unifying 3D Human Motion Synthesis Across Diverse Interaction Contexts
- URL: http://arxiv.org/abs/2511.13032v1
- Date: Mon, 17 Nov 2025 06:32:38 GMT
- Title: Uni-Inter: Unifying 3D Human Motion Synthesis Across Diverse Interaction Contexts
- Authors: Sheng Liu, Yuanzhi Liang, Jiepeng Wang, Sidan Du, Chi Zhang, Xuelong Li,
- Abstract summary: We present Uni-Inter, a unified framework for human motion generation that supports a wide range of interaction scenarios.<n>Uni-Inter introduces the Unified Interactive Volume (UIV), a volumetric representation that encodes heterogeneous interactive entities into a shared spatial field.
- Score: 59.78384600454231
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present Uni-Inter, a unified framework for human motion generation that supports a wide range of interaction scenarios: including human-human, human-object, and human-scene-within a single, task-agnostic architecture. In contrast to existing methods that rely on task-specific designs and exhibit limited generalization, Uni-Inter introduces the Unified Interactive Volume (UIV), a volumetric representation that encodes heterogeneous interactive entities into a shared spatial field. This enables consistent relational reasoning and compound interaction modeling. Motion generation is formulated as joint-wise probabilistic prediction over the UIV, allowing the model to capture fine-grained spatial dependencies and produce coherent, context-aware behaviors. Experiments across three representative interaction tasks demonstrate that Uni-Inter achieves competitive performance and generalizes well to novel combinations of entities. These results suggest that unified modeling of compound interactions offers a promising direction for scalable motion synthesis in complex environments.
Related papers
- Learning Human-Object Interaction as Groups [52.28258599873394]
GroupHOI is a framework that propagates contextual information in terms of geometric proximity and semantic similarity.<n>It exhibits leading performance on the more challenging Nonverbal Interaction Detection task.
arXiv Detail & Related papers (2025-10-21T07:25:10Z) - InterSyn: Interleaved Learning for Dynamic Motion Synthesis in the Wild [65.29569330744056]
We present Interleaved Learning for Motion Synthesis (InterSyn), a novel framework that targets the generation of realistic interaction motions.<n>InterSyn employs an interleaved learning strategy to capture the natural, dynamic interactions and nuanced coordination inherent in real-world scenarios.
arXiv Detail & Related papers (2025-08-14T03:00:06Z) - Relation Learning and Aggregate-attention for Multi-person Motion Prediction [13.052342503276936]
Multi-person motion prediction considers not just the skeleton structures or human trajectories but also the interactions between others.
Previous methods often overlook that the joints relations within an individual (intra-relation) and interactions among groups (inter-relation) are distinct types of representations.
We introduce a new collaborative framework for multi-person motion prediction that explicitly modeling these relations.
arXiv Detail & Related papers (2024-11-06T07:48:30Z) - Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR)
In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion.
We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Compositional Human-Scene Interaction Synthesis with Semantic Control [16.93177243590465]
We aim to synthesize humans interacting with a given 3D scene controlled by high-level semantic specifications.
We design a novel transformer-based generative model, in which the articulated 3D human body surface points and 3D objects are jointly encoded.
Inspired by the compositional nature of interactions that humans can simultaneously interact with multiple objects, we define interaction semantics as the composition of varying numbers of atomic action-object pairs.
arXiv Detail & Related papers (2022-07-26T11:37:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.