Diffusion Model-based Activity Completion for AI Motion Capture from Videos
- URL: http://arxiv.org/abs/2505.21566v1
- Date: Tue, 27 May 2025 05:04:50 GMT
- Title: Diffusion Model-based Activity Completion for AI Motion Capture from Videos
- Authors: Gao Huayu, Huang Tengjiu, Ye Xiaolong, Tsuyoshi Okita,
- Abstract summary: Current AI motion capture methods rely entirely on observed video sequences, similar to conventional motion capture.<n>We propose a diffusion-model-based action completion technique that generates complementary human motion sequences.<n>By introducing a gate module and a position-time embedding module, our approach achieves competitive results on the Human3.6M dataset.
- Score: 2.9271399793140076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI-based motion capture is an emerging technology that offers a cost-effective alternative to traditional motion capture systems. However, current AI motion capture methods rely entirely on observed video sequences, similar to conventional motion capture. This means that all human actions must be predefined, and movements outside the observed sequences are not possible. To address this limitation, we aim to apply AI motion capture to virtual humans, where flexible actions beyond the observed sequences are required. We assume that while many action fragments exist in the training data, the transitions between them may be missing. To bridge these gaps, we propose a diffusion-model-based action completion technique that generates complementary human motion sequences, ensuring smooth and continuous movements. By introducing a gate module and a position-time embedding module, our approach achieves competitive results on the Human3.6M dataset. Our experimental results show that (1) MDC-Net outperforms existing methods in ADE, FDE, and MMADE but is slightly less accurate in MMFDE, (2) MDC-Net has a smaller model size (16.84M) compared to HumanMAC (28.40M), and (3) MDC-Net generates more natural and coherent motion sequences. Additionally, we propose a method for extracting sensor data, including acceleration and angular velocity, from human motion sequences.
Related papers
- CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning [47.195002937893115]
CoMo aims to learn more informative continuous motion representations from diverse, internet-scale videos.<n>We introduce two new metrics for more robustly and affordably evaluating motion and guiding motion learning methods.<n>CoMo exhibits strong zero-shot generalization, enabling it to generate continuous pseudo actions for previously unseen video domains.
arXiv Detail & Related papers (2025-05-22T17:58:27Z) - GENMO: A GENeralist Model for Human MOtion [64.16188966024542]
We present GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework.<n>Our key insight is to reformulate motion estimation as constrained motion generation, where the output motion must precisely satisfy observed conditioning signals.<n>Our novel architecture handles variable-length motions and mixed multimodal conditions (text, audio, video) at different time intervals, offering flexible control.
arXiv Detail & Related papers (2025-05-02T17:59:55Z) - EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models [73.96414072072048]
Existing motion transfer methods explored the motion representations of reference videos to guide generation.<n>We propose EfficientMT, a novel and efficient end-to-end framework for video motion transfer.<n>Our experiments demonstrate that our EfficientMT outperforms existing methods in efficiency while maintaining flexible motion controllability.
arXiv Detail & Related papers (2025-03-25T05:51:14Z) - A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions [56.709280823844374]
We introduce a mask-based motion correction module (MCM) that leverages motion context and video mask to repair flawed motions.<n>We also propose a physics-based motion transfer module (PTM), which employs a pretrain and adapt approach for motion imitation.<n>Our approach is designed as a plug-and-play module to physically refine the video motion capture results, including high-difficulty in-the-wild motions.
arXiv Detail & Related papers (2024-12-23T08:26:00Z) - MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds [20.83684434910106]
We present MoManifold, a novel human motion prior, which models plausible human motion in continuous high-dimensional motion space.
Specifically, we propose novel decoupled joint acceleration to model human dynamics from existing limited motion data.
Extensive experiments demonstrate that MoManifold outperforms existing SOTAs as a prior in several downstream tasks.
arXiv Detail & Related papers (2024-09-01T15:00:16Z) - Generalizable Implicit Motion Modeling for Video Frame Interpolation [51.966062283735596]
Motion is critical in flow-based Video Frame Interpolation (VFI)<n>We introduce General Implicit Motion Modeling (IMM), a novel and effective approach to motion modeling VFI.<n>Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion.
arXiv Detail & Related papers (2024-07-11T17:13:15Z) - Interactive Character Control with Auto-Regressive Motion Diffusion Models [18.727066177880708]
We propose A-MDM (Auto-regressive Motion Diffusion Model) for real-time motion synthesis.
Our conditional diffusion model takes an initial pose as input, and auto-regressively generates successive motion frames conditioned on previous frame.
We introduce a suite of techniques for incorporating interactive controls into A-MDM, such as task-oriented sampling, in-painting, and hierarchical reinforcement learning.
arXiv Detail & Related papers (2023-06-01T07:48:34Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - Learning Variational Motion Prior for Video-based Motion Capture [31.79649766268877]
We present a novel variational motion prior (VMP) learning approach for video-based motion capture.
Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation.
Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
arXiv Detail & Related papers (2022-10-27T02:45:48Z) - Weakly-supervised Action Transition Learning for Stochastic Human Motion
Prediction [81.94175022575966]
We introduce the task of action-driven human motion prediction.
It aims to predict multiple plausible future motions given a sequence of action labels and a short motion history.
arXiv Detail & Related papers (2022-05-31T08:38:07Z) - Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape
Estimation from Monocular Video [24.217269857183233]
We propose a motion pose and shape network (MPS-Net) to capture humans in motion to estimate 3D human pose and shape from a video.
Specifically, we first propose a motion continuity attention (MoCA) module that leverages visual cues observed from human motion to adaptively recalibrate the range that needs attention in the sequence.
By coupling the MoCA and HAFI modules, the proposed MPS-Net excels in estimating 3D human pose and shape in the video.
arXiv Detail & Related papers (2022-03-16T11:00:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.