Related papers: Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases

Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases

URL: http://arxiv.org/abs/2310.04189v3
Date: Thu, 11 Jul 2024 09:39:02 GMT
Title: Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases
Authors: Xinpeng Liu, Yong-Lu Li, Ailing Zeng, Zizheng Zhou, Yang You, Cewu Lu,
Abstract summary: Motion understanding aims to establish a reliable mapping between motion and action semantics. We propose Kinematic Phrases (KP) that take the objective kinematic facts of human motion with proper abstraction, interpretability, and generality. Based on KP, we can unify a motion knowledge base and build a motion understanding system.
Score: 59.32509533292653
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Motion understanding aims to establish a reliable mapping between motion and action semantics, while it is a challenging many-to-many problem. An abstract action semantic (i.e., walk forwards) could be conveyed by perceptually diverse motions (walking with arms up or swinging). In contrast, a motion could carry different semantics w.r.t. its context and intention. This makes an elegant mapping between them difficult. Previous attempts adopted direct-mapping paradigms with limited reliability. Also, current automatic metrics fail to provide reliable assessments of the consistency between motions and action semantics. We identify the source of these problems as the significant gap between the two modalities. To alleviate this gap, we propose Kinematic Phrases (KP) that take the objective kinematic facts of human motion with proper abstraction, interpretability, and generality. Based on KP, we can unify a motion knowledge base and build a motion understanding system. Meanwhile, KP can be automatically converted from motions to text descriptions with no subjective bias, inspiring Kinematic Prompt Generation (KPG) as a novel white-box motion generation benchmark. In extensive experiments, our approach shows superiority over other methods. Our project is available at https://foruck.github.io/KP/.

Related papers

Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation [37.496002022338395]
We argue that the solution lies in an architectural shift towards Latent System 2 Reasoning.<n>We propose Latent Motion Reasoning (LMR) that reformulates generation as a two-stage Think-then-Act decision process.<n>We demonstrate LMR's versatility by implementing it for two representative baselines: T2M-GPT (discrete) and MotionStreamer (continuous)
arXiv Detail & Related papers (2025-12-30T09:17:44Z)
DisMo: Disentangled Motion Representations for Open-World Motion Transfer [21.557843791867906]
DisMo is a novel paradigm for learning abstract motion representations directly from raw video data.<n>Our representation is generic and independent of static information such as appearance, object identity, or pose.<n>We show that the learned representations are well-suited for downstream motion understanding tasks.
arXiv Detail & Related papers (2025-11-28T18:25:54Z)
Diffusion Implicit Policy for Unpaired Scene-aware Motion Synthesis [48.65197562914734]
We propose a unified framework, termed Diffusion Implicit Policy (DIP), for scene-aware motion synthesis. In this framework, we disentangle human-scene interaction from motion synthesis during training. We show that our framework presents better motion naturalness and interaction plausibility than cutting-edge methods.
arXiv Detail & Related papers (2024-12-03T08:34:41Z)
Motion Dreamer: Boundary Conditional Motion Reasoning for Physically Coherent Video Generation [27.690736225683825]
We introduce Motion Dreamer, a two-stage framework that explicitly separates motion reasoning from visual synthesis. Our approach introduces instance flow, a sparse-to-dense motion representation enabling effective integration of partial user-defined motions. Experiments demonstrate that Motion Dreamer significantly outperforms existing methods, achieving superior motion plausibility and visual realism.
arXiv Detail & Related papers (2024-11-30T17:40:49Z)
KinMo: Kinematic-aware Human Motion Understanding and Generation [6.962697597686156]
Controlling human motion based on text presents an important challenge in computer vision. Traditional approaches often rely on holistic action descriptions for motion synthesis. We propose a novel motion representation that decomposes motion into distinct body joint group movements.
arXiv Detail & Related papers (2024-11-23T06:50:11Z)
Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer [55.109778609058154]
Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models. We uncover the roles and interactions of attention elements in capturing and representing motion patterns. We integrate these elements to transfer a leader motion to a follower one while maintaining the nuanced characteristics of the follower, resulting in zero-shot motion transfer.
arXiv Detail & Related papers (2024-06-10T17:47:14Z)
Guided Attention for Interpretable Motion Captioning [0.0]
We introduce a novel architecture that enhances text generation quality by emphasizing interpretability. To encourage human-like reasoning, we propose methods for guiding attention during training. We leverage interpretability to derive fine-grained information about human motion.
arXiv Detail & Related papers (2023-10-11T09:14:30Z)
DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions. We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z)
AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism [24.049207982022214]
We propose textbftT2M, a two-stage method with multi-perspective attention mechanism. Our method outperforms the current state-of-the-art in terms of qualitative and quantitative evaluation.
arXiv Detail & Related papers (2023-09-02T02:18:17Z)
Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation. M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse. We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z)
Human MotionFormer: Transferring Human Motions with Vision Transformers [73.48118882676276]
Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis. We propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching. Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-02-22T11:42:44Z)
Audio2Gestures: Generating Diverse Gestures from Audio [28.026220492342382]
We propose to explicitly model the one-to-many audio-to-motion mapping by splitting the cross-modal latent code into shared code and motion-specific code. Our method generates more realistic and diverse motions than previous state-of-the-art methods.
arXiv Detail & Related papers (2023-01-17T04:09:58Z)
MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis. We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework. We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z)
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework. It excels at modeling complicated data distribution and generating vivid motion sequences. It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.