H-MoRe: Learning Human-centric Motion Representation for Action Analysis
- URL: http://arxiv.org/abs/2504.10676v1
- Date: Mon, 14 Apr 2025 19:56:52 GMT
- Title: H-MoRe: Learning Human-centric Motion Representation for Action Analysis
- Authors: Zhanbo Huang, Xiaoming Liu, Yu Kong,
- Abstract summary: H-MoRe is a novel pipeline for learning precise human-centric motion representation.<n>Unlike previous methods, H-MoRe learns directly from real-world scenarios in a self-supervised manner.<n>H-MoRe offers refined insights into human motion, which can be integrated seamlessly into action-related applications.
- Score: 27.966744383799835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose H-MoRe, a novel pipeline for learning precise human-centric motion representation. Our approach dynamically preserves relevant human motion while filtering out background movement. Notably, unlike previous methods relying on fully supervised learning from synthetic data, H-MoRe learns directly from real-world scenarios in a self-supervised manner, incorporating both human pose and body shape information. Inspired by kinematics, H-MoRe represents absolute and relative movements of each body point in a matrix format that captures nuanced motion details, termed world-local flows. H-MoRe offers refined insights into human motion, which can be integrated seamlessly into various action-related applications. Experimental results demonstrate that H-MoRe brings substantial improvements across various downstream tasks, including gait recognition(CL@R1: +16.01%), action recognition(Acc@1: +8.92%), and video generation(FVD: -67.07%). Additionally, H-MoRe exhibits high inference efficiency (34 fps), making it suitable for most real-time scenarios. Models and code will be released upon publication.
Related papers
- Scaling Large Motion Models with Million-Level Human Motions [67.40066387326141]
We present MotionLib, the first million-level dataset for motion generation.<n>We train a large motion model named projname, demonstrating robust performance across a wide range of human activities.
arXiv Detail & Related papers (2024-10-04T10:48:54Z) - MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds [20.83684434910106]
We present MoManifold, a novel human motion prior, which models plausible human motion in continuous high-dimensional motion space.
Specifically, we propose novel decoupled joint acceleration to model human dynamics from existing limited motion data.
Extensive experiments demonstrate that MoManifold outperforms existing SOTAs as a prior in several downstream tasks.
arXiv Detail & Related papers (2024-09-01T15:00:16Z) - Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction [10.982807572404166]
We present GazeMo - a novel gaze-guided denoising diffusion model to generate human motions.
Our method first uses a gaze encoder to extract the gaze and motion features respectively, then employs a graph attention network to fuse these features.
Our method outperforms the state-of-the-art methods by a large margin in terms of multi-modal final error.
arXiv Detail & Related papers (2023-12-19T12:10:12Z) - ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions [66.87211993793807]
We present ReMoS, a denoising diffusion based model that synthesizes full body motion of a person in two person interaction scenario.
We demonstrate ReMoS across challenging two person scenarios such as pair dancing, Ninjutsu, kickboxing, and acrobatics.
We also contribute the ReMoCap dataset for two person interactions containing full body and finger motions.
arXiv Detail & Related papers (2023-11-28T18:59:52Z) - Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control.
We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset.
We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos [20.895221536570627]
Human mesh recovery (HMR) provides rich human body information for various real-world applications.<n>Video-based approaches leverage temporal information to mitigate this issue.<n>We present DiffMesh, an innovative motion-aware Diffusion-like framework for video-based HMR.
arXiv Detail & Related papers (2023-03-23T16:15:18Z) - Human MotionFormer: Transferring Human Motions with Vision Transformers [73.48118882676276]
Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis.
We propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching.
Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-02-22T11:42:44Z) - An Identity-Preserved Framework for Human Motion Transfer [3.6286856791379463]
Human motion transfer (HMT) aims to generate a video clip for the target subject by imitating the source subject's motion.
Previous methods have achieved good results in good-quality videos, but lose sight of individualized motion information from the source and target motions.
We propose a novel identity-preserved HMT network, termed textitIDPres.
arXiv Detail & Related papers (2022-04-14T10:27:19Z) - HuMoR: 3D Human Motion Model for Robust Pose Estimation [100.55369985297797]
HuMoR is a 3D Human Motion Model for Robust Estimation of temporal pose and shape.
We introduce a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence.
We demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset.
arXiv Detail & Related papers (2021-05-10T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.