High Quality Human Image Animation using Regional Supervision and Motion Blur Condition
- URL: http://arxiv.org/abs/2409.19580v1
- Date: Sun, 29 Sep 2024 06:46:31 GMT
- Title: High Quality Human Image Animation using Regional Supervision and Motion Blur Condition
- Authors: Zhongcong Xu, Chaoyue Song, Guoxian Song, Jianfeng Zhang, Jun Hao Liew, Hongyi Xu, You Xie, Linjie Luo, Guosheng Lin, Jiashi Feng, Mike Zheng Shou,
- Abstract summary: We leverage regional supervision for detailed regions to enhance face and hand faithfulness.
Second, we model the motion blur explicitly to further improve the appearance quality.
Third, we explore novel training strategies for high-resolution human animation to improve the overall fidelity.
- Score: 97.97432499053966
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in video diffusion models have enabled realistic and controllable human image animation with temporal coherence. Although generating reasonable results, existing methods often overlook the need for regional supervision in crucial areas such as the face and hands, and neglect the explicit modeling for motion blur, leading to unrealistic low-quality synthesis. To address these limitations, we first leverage regional supervision for detailed regions to enhance face and hand faithfulness. Second, we model the motion blur explicitly to further improve the appearance quality. Third, we explore novel training strategies for high-resolution human animation to improve the overall fidelity. Experimental results demonstrate that our proposed method outperforms state-of-the-art approaches, achieving significant improvements upon the strongest baseline by more than 21.0% and 57.4% in terms of reconstruction precision (L1) and perceptual quality (FVD) on HumanDance dataset. Code and model will be made available.
Related papers
- Efficient Neural Implicit Representation for 3D Human Reconstruction [38.241511336562844]
Conventional methods for reconstructing 3D human motion frequently require the use of expensive hardware and have high processing costs.
This study presents HumanAvatar, an innovative approach that efficiently reconstructs precise human avatars from monocular video sources.
arXiv Detail & Related papers (2024-10-23T10:16:01Z) - Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss [12.565642618427844]
Diffusion models can synthesize images, including the generation of humans in specific poses.
Current models face challenges in adequately expressing conditional control for detailed hand pose generation.
We propose a novel Region-Aware Cycle Loss (RACL) that enables the diffusion model training to focus on improving the hand region.
arXiv Detail & Related papers (2024-09-13T19:09:19Z) - COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation [98.05046790227561]
COIN is a control-inpainting motion diffusion prior that enables fine-grained control to disentangle human and camera motions.
COIN outperforms the state-of-the-art methods in terms of global human motion estimation and camera motion estimation.
arXiv Detail & Related papers (2024-08-29T10:36:29Z) - Aligning Human Motion Generation with Human Perceptions [51.831338643012444]
We propose a data-driven approach to bridge the gap by introducing a large-scale human perceptual evaluation dataset, MotionPercept, and a human motion critic model, MotionCritic.
Our critic model offers a more accurate metric for assessing motion quality and could be readily integrated into the motion generation pipeline.
arXiv Detail & Related papers (2024-07-02T14:01:59Z) - VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision
Tuning [53.35114015288077]
We bridge the domain gap between natural and artificial scenarios with efficient tuning strategies.
We develop a novel framework called VLPose to extend the generalization and robustness of pose estimation models.
Our approach has demonstrated improvements of 2.26% and 3.74% on HumanArt and MSCOCO, respectively.
arXiv Detail & Related papers (2024-02-22T11:21:54Z) - Neural Point-based Volumetric Avatar: Surface-guided Neural Points for
Efficient and Photorealistic Volumetric Head Avatar [62.87222308616711]
We propose fullname (name), a method that adopts the neural point representation and the neural volume rendering process.
Specifically, the neural points are strategically constrained around the surface of the target expression via a high-resolution UV displacement map.
By design, our name is better equipped to handle topologically changing regions and thin structures while also ensuring accurate expression control when animating avatars.
arXiv Detail & Related papers (2023-07-11T03:40:10Z) - Face Animation with an Attribute-Guided Diffusion Model [41.43427420949979]
We propose a Face Animation framework with an attribute-guided Diffusion Model (FADM)
FADM is first work to exploit the superior modeling capacity of diffusion models for photo-realistic talking-head generation.
arXiv Detail & Related papers (2023-04-06T16:22:32Z) - HDHumans: A Hybrid Approach for High-fidelity Digital Humans [107.19426606778808]
HDHumans is the first method for HD human character synthesis that jointly produces an accurate and temporally coherent 3D deforming surface.
Our method is carefully designed to achieve a synergy between classical surface deformation and neural radiance fields (NeRF)
arXiv Detail & Related papers (2022-10-21T14:42:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.