Related papers: AniFaceDiff: Animating Stylized Avatars via Parametric Conditioned Diffusion Models

AniFaceDiff: Animating Stylized Avatars via Parametric Conditioned Diffusion Models

URL: http://arxiv.org/abs/2406.13272v2
Date: Mon, 02 Dec 2024 12:18:12 GMT
Title: AniFaceDiff: Animating Stylized Avatars via Parametric Conditioned Diffusion Models
Authors: Ken Chen, Sachith Seneviratne, Wei Wang, Dongting Hu, Sanjay Saha, Md. Tarek Hasan, Sanka Rasnayaka, Tamasha Malepathirana, Mingming Gong, Saman Halgamuge,
Abstract summary: This paper proposes a new method based on Stable Diffusion, called AniFaceDiff, incorporating a new conditioning module for animating stylized avatars.<n>Our approach effectively preserves pose and expression from the target video while maintaining input image consistency.<n>This work aims to enhance the quality of virtual stylized animation for positive applications.
Score: 33.39336530229545
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Animating stylized avatars with dynamic poses and expressions has attracted increasing attention for its broad range of applications. Previous research has made significant progress by training controllable generative models to synthesize animations based on reference characteristics, pose, and expression conditions. However, the mechanisms used in these methods to control pose and expression often inadvertently introduce unintended features from the target motion, while also causing a loss of expression-related details, particularly when applied to stylized animation. This paper proposes a new method based on Stable Diffusion, called AniFaceDiff, incorporating a new conditioning module for animating stylized avatars. First, we propose a refined spatial conditioning approach by Facial Alignment to prevent the inclusion of identity characteristics from the target motion. Then, we introduce an Expression Adapter that incorporates additional cross-attention layers to address the potential loss of expression-related information. Our approach effectively preserves pose and expression from the target video while maintaining input image consistency. Extensive experiments demonstrate that our method achieves state-of-the-art results, showcasing superior image quality, preservation of reference features, and expression accuracy, particularly for out-of-domain animation across diverse styles, highlighting its versatility and strong generalization capabilities. This work aims to enhance the quality of virtual stylized animation for positive applications. To promote responsible use in virtual environments, we contribute to the advancement of detection for generative content by evaluating state-of-the-art detectors, highlighting potential areas for improvement, and suggesting solutions.

Related papers

EVA: Expressive Virtual Avatars from Multi-view Videos [51.33851869426057]
We introduce Expressive Virtual Avatars (EVA), an actor-specific, fully controllable, and expressive human avatar framework.<n>EVA achieves high-fidelity, lifelike renderings in real time while enabling independent control of facial expressions, body movements, and hand gestures.<n>This work represents a significant advancement towards fully drivable digital human models.
arXiv Detail & Related papers (2025-05-21T11:22:52Z)
Model See Model Do: Speech-Driven Facial Animation with Style Control [14.506128477193991]
Speech-driven 3D facial animation plays a key role in applications such as virtual avatars, gaming, and digital content creation.<n>Existing methods have made significant progress in achieving accurate lip synchronization and generating basic emotional expressions.<n>We propose a novel example-based generation framework that conditions a latent diffusion model on a reference style clip.
arXiv Detail & Related papers (2025-05-02T14:47:21Z)
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance [30.225654002561512]
We introduce Animate Anyone 2, aiming to animate characters with environment affordance. We propose a shape-agnostic mask strategy that more effectively characterizes the relationship between character and environment. We also introduce a pose modulation strategy that enables the model to handle more diverse motion patterns.
arXiv Detail & Related papers (2025-02-10T04:20:11Z)
X-Dyna: Expressive Dynamic Human Image Animation [49.896933584815926]
X-Dyna is a zero-shot, diffusion-based pipeline for animating a single human image. It generates realistic, context-aware dynamics for both the subject and the surrounding environment.
arXiv Detail & Related papers (2025-01-17T08:10:53Z)
Consistent Human Image and Video Generation with Spatially Conditioned Diffusion [82.4097906779699]
Consistent human-centric image and video synthesis aims to generate images with new poses while preserving appearance consistency with a given reference image. We frame the task as a spatially-conditioned inpainting problem, where the target image is in-painted to maintain appearance consistency with the reference. This approach enables the reference features to guide the generation of pose-compliant targets within a unified denoising network.
arXiv Detail & Related papers (2024-12-19T05:02:30Z)
OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration. We propose OSDFace, a novel one-step diffusion model for face restoration. Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z)
VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation [79.99551055245071]
We propose VividPose, an end-to-end pipeline that ensures superior temporal stability. An identity-aware appearance controller integrates additional facial information without compromising other appearance details. A geometry-aware pose controller utilizes both dense rendering maps from SMPL-X and sparse skeleton maps. VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-05-28T13:18:32Z)
DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis [71.40724659748787]
DiffusionFace is the first diffusion-based face forgery dataset. It covers various forgery categories, including unconditional and Text Guide facial image generation, Img2Img, Inpaint, and Diffusion-based facial exchange algorithms. It provides essential metadata and a real-world internet-sourced forgery facial image dataset for evaluation.
arXiv Detail & Related papers (2024-03-27T11:32:44Z)
DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment [34.821255203019554]
Video-driven neural face reenactment aims to synthesize realistic facial images that successfully preserve the identity and appearance of a source face. Recent advances in Diffusion Probabilistic Models (DPMs) enable the generation of high-quality realistic images. We present Diffusion, a novel method that leverages the photo-realistic image generation of diffusion models to perform neural face reenactment.
arXiv Detail & Related papers (2024-03-25T21:46:53Z)
Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment. We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z)
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians [51.46168990249278]
We present an efficient approach to creating realistic human avatars with dynamic 3D appearances from a single video. GustafAvatar is validated on both the public dataset and our collected dataset.
arXiv Detail & Related papers (2023-12-04T18:55:45Z)
When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images. We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models. Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z)
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities. In this paper, we propose a novel framework tailored for character animation. By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z)
HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces [47.27033282706179]
We present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity. Our method operates under the one-shot setting (i.e., using a single source frame) and allows for cross-subject reenactment, without requiring subject-specific fine-tuning. We compare our method both quantitatively and qualitatively against several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2.
arXiv Detail & Related papers (2023-07-20T11:59:42Z)
Drivable Volumetric Avatars using Texel-Aligned Features [52.89305658071045]
Photo telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance. We propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people.
arXiv Detail & Related papers (2022-07-20T09:28:16Z)
Graph-based Generative Face Anonymisation with Pose Preservation [49.18049578591058]
AnonyGAN is a GAN-based solution for face anonymisation. It replaces the visual information corresponding to a source identity with a condition identity provided as any single image.
arXiv Detail & Related papers (2021-12-10T12:58:17Z)
Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection [65.92058628082322]
Non-parametric face modeling aims to reconstruct 3D face only from images without shape assumptions. This paper presents a novel Learning to Aggregate and Personalize framework for unsupervised robust 3D face modeling.
arXiv Detail & Related papers (2021-06-15T03:10:17Z)
LI-Net: Large-Pose Identity-Preserving Face Reenactment Network [14.472453602392182]
We propose a large-pose identity-preserving face reenactment network, LI-Net. Specifically, the Landmark Transformer is adopted to adjust driving landmark images. The Face Rotation Module and the Expression Enhancing Generator decouple the transformed landmark image into pose and expression features, and reenact those attributes separately to generate identity-preserving faces.
arXiv Detail & Related papers (2021-04-07T01:41:21Z)
VAE/WGAN-Based Image Representation Learning For Pose-Preserving Seamless Identity Replacement In Facial Images [15.855376604558977]
We present a novel variational generative adversarial network (VGAN) based on Wasserstein loss. We show that our network can be used to perform pose-preserving identity morphing and identity-preserving pose morphing.
arXiv Detail & Related papers (2020-03-02T03:35:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.