Progressive Human Motion Generation Based on Text and Few Motion Frames
- URL: http://arxiv.org/abs/2503.13300v2
- Date: Sun, 30 Mar 2025 06:29:58 GMT
- Title: Progressive Human Motion Generation Based on Text and Few Motion Frames
- Authors: Ling-An Zeng, Gaojie Wu, Ancong Wu, Jian-Fang Hu, Wei-Shi Zheng,
- Abstract summary: A Text-Frame-to-Motion (TF2M) generation task aims to generate motions from text and very few given frames.<n>We propose a novel Progressive Motion Generation (PMG) method to progressively generate a motion from the frames with low uncertainty.<n>Our PMG outperforms existing T2M generation methods by a large margin with even one given frame.
- Score: 41.00546984852018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although existing text-to-motion (T2M) methods can produce realistic human motion from text description, it is still difficult to align the generated motion with the desired postures since using text alone is insufficient for precisely describing diverse postures. To achieve more controllable generation, an intuitive way is to allow the user to input a few motion frames describing precise desired postures. Thus, we explore a new Text-Frame-to-Motion (TF2M) generation task that aims to generate motions from text and very few given frames. Intuitively, the closer a frame is to a given frame, the lower the uncertainty of this frame is when conditioned on this given frame. Hence, we propose a novel Progressive Motion Generation (PMG) method to progressively generate a motion from the frames with low uncertainty to those with high uncertainty in multiple stages. During each stage, new frames are generated by a Text-Frame Guided Generator conditioned on frame-aware semantics of the text, given frames, and frames generated in previous stages. Additionally, to alleviate the train-test gap caused by multi-stage accumulation of incorrectly generated frames during testing, we propose a Pseudo-frame Replacement Strategy for training. Experimental results show that our PMG outperforms existing T2M generation methods by a large margin with even one given frame, validating the effectiveness of our PMG. Code is available at https://github.com/qinghuannn/PMG.
Related papers
- Generative Inbetweening through Frame-wise Conditions-Driven Video Generation [63.43583844248389]
generative inbetweening aims to generate intermediate frame sequences by utilizing two key frames as input.<n>We propose a Frame-wise Conditions-driven Video Generation (FCVG) method that significantly enhances the temporal stability of interpolated video frames.<n>Our FCVG demonstrates the capability to generate temporally stable videos using both linear and non-linear curves.
arXiv Detail & Related papers (2024-12-16T13:19:41Z) - Pose-Guided Fine-Grained Sign Language Video Generation [18.167413937989867]
We propose a novel Pose-Guided Motion Model (PGMM) for generating fine-grained and motion-consistent sign language videos.
Firstly, we propose a new Coarse Motion Module (CMM), which completes the deformation of features by optical flow warping.
Secondly, we propose a new Pose Fusion Module (PFM), which guides the modal fusion of RGB and pose features.
arXiv Detail & Related papers (2024-09-25T07:54:53Z) - OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers [45.808597624491156]
We present OMG, a novel framework, which enables compelling motion generation from zero-shot open-vocabulary text prompts.
At the pre-training stage, our model improves the generation ability by learning the rich out-of-domain inherent motion traits.
At the fine-tuning stage, we introduce motion ControlNet, which incorporates text prompts as conditioning information.
arXiv Detail & Related papers (2023-12-14T14:31:40Z) - Text-driven Video Prediction [83.04845684117835]
We propose a new task called Text-driven Video Prediction (TVP)
Taking the first frame and text caption as inputs, this task aims to synthesize the following frames.
To investigate the capability of text in causal inference for progressive motion information, our TVP framework contains a Text Inference Module (TIM)
arXiv Detail & Related papers (2022-10-06T12:43:07Z) - Towards Frame Rate Agnostic Multi-Object Tracking [76.82407173177138]
We propose a Frame Rate Agnostic MOT framework with a Periodic training Scheme (FAPS) to tackle the FraMOT problem for the first time.
Specifically, we propose a Frame Rate Agnostic Association Module (FAAM) that infers and encodes the frame rate information.
FAPS reflects all post-processing steps in training via tracking pattern matching and fusion.
arXiv Detail & Related papers (2022-09-23T04:25:19Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.