Related papers: Human Motion Video Generation: A Survey

Human Motion Video Generation: A Survey

URL: http://arxiv.org/abs/2509.03883v1
Date: Thu, 04 Sep 2025 04:39:21 GMT
Title: Human Motion Video Generation: A Survey
Authors: Haiwei Xue, Xiangyang Luo, Zhanghao Hu, Xin Zhang, Xunzhi Xiang, Yuqin Dai, Jianzhuang Liu, Zhensong Zhang, Minglei Li, Jian Yang, Fei Ma, Zhiyong Wu, Changpeng Yang, Zonghong Dai, Fei Richard Yu,
Abstract summary: This paper provides an in-depth survey of human motion video generation, encompassing over ten sub-tasks.<n>It details the five key phases of the generation process: input, motion planning, motion video generation, refinement, and output.<n> Notably, this is the first survey that discusses the potential of large language models in enhancing human motion video generation.
Score: 65.24556163013375
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human motion video generation has garnered significant research interest due to its broad applications, enabling innovations such as photorealistic singing heads or dynamic avatars that seamlessly dance to music. However, existing surveys in this field focus on individual methods, lacking a comprehensive overview of the entire generative process. This paper addresses this gap by providing an in-depth survey of human motion video generation, encompassing over ten sub-tasks, and detailing the five key phases of the generation process: input, motion planning, motion video generation, refinement, and output. Notably, this is the first survey that discusses the potential of large language models in enhancing human motion video generation. Our survey reviews the latest developments and technological trends in human motion video generation across three primary modalities: vision, text, and audio. By covering over two hundred papers, we offer a thorough overview of the field and highlight milestone works that have driven significant technological breakthroughs. Our goal for this survey is to unveil the prospects of human motion video generation and serve as a valuable resource for advancing the comprehensive applications of digital humans. A complete list of the models examined in this survey is available in Our Repository https://github.com/Winn1y/Awesome-Human-Motion-Video-Generation.

Related papers

From Generated Human Videos to Physically Plausible Robot Trajectories [103.28274349461607]
Video generation models are rapidly improving in their ability to synthesize human actions in novel contexts.<n>To realize this potential, how can a humanoid execute the human actions from generated videos in a zero-shot manner?<n>This challenge arises because generated videos are often noisy and exhibit morphological distortions that make direct imitation difficult compared to real video.<n>We propose GenMimic, a physics-aware reinforcement learning policy conditioned on 3D keypoints, and trained with symmetry regularization and keypoint-weighted tracking rewards.
arXiv Detail & Related papers (2025-12-04T18:56:03Z)
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation [28.007696532331934]
We propose a decoupled human video generation framework that first generates diverse poses from text prompts.<n>We present MotionDiT, which is trained to generate structured human-motion poses from text prompts.<n>Our experiments across various Pose-to-Video baselines demonstrate that the poses generated by our method can produce diverse and high-quality human-motion videos.
arXiv Detail & Related papers (2025-03-31T12:51:45Z)
Vision-to-Music Generation: A Survey [10.993775589904251]
Vision-to-music generation shows vast application prospects in fields such as film scoring, short video creation, and dance music synthesis.<n>Research in vision-to-music is still in its preliminary stage due to its complex internal structure and the difficulty of modeling dynamic relationships with video.<n>Existing surveys focus on general music generation without comprehensive discussion on vision-to-music.
arXiv Detail & Related papers (2025-03-27T08:21:54Z)
ASurvey: Spatiotemporal Consistency in Video Generation [72.82267240482874]
Video generation schemes by leveraging a dynamic visual generation method, pushes the boundaries of Artificial Intelligence Generated Content (AIGC)<n>Recent works have aimed at addressing thetemporal consistency issue in video generation, while few literature review has been organized from this perspective.<n>We systematically review recent advances in video generation, covering five key aspects: foundation models, information representations, generation schemes, post-processing techniques, and evaluation metrics.
arXiv Detail & Related papers (2025-02-25T05:20:51Z)
Llama Learns to Direct: DirectorLLM for Human-Centric Video Generation [54.561971554162376]
We introduce DirectorLLM, a novel video generation model that employs a large language model (LLM) to orchestrate human poses within videos.<n>Our model outperforms existing ones in generating videos with higher human motion fidelity, improved prompt faithfulness, and enhanced rendered subject naturalness.
arXiv Detail & Related papers (2024-12-19T03:10:26Z)
A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights [8.192172339127657]
Human video generation aims to synthesize 2D human body video sequences with generative models given control conditions such as text, audio, and pose. Recent advancements in generative models have laid a solid foundation for the growing interest in this area. Despite the significant progress, the task of human video generation remains challenging due to the consistency of characters, the complexity of human motion, and difficulties in their relationship with the environment.
arXiv Detail & Related papers (2024-07-11T12:09:05Z)
Human Motion Generation: A Survey [67.38982546213371]
Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications. Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts. We present a comprehensive literature review of human motion generation, which is the first of its kind in this field.
arXiv Detail & Related papers (2023-07-20T14:15:20Z)
Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis [55.72674354651122]
We first summarize the scope of person generation, then systematically review recent progress and technical trends in deep person generation. More than two hundred papers are covered for a thorough overview, and the milestone works are highlighted to witness the major technical breakthrough. We hope this survey could shed some light on the future prospects of deep person generation, and provide a helpful foundation for full applications towards digital human.
arXiv Detail & Related papers (2021-09-05T14:15:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.