GenEAva: Generating Cartoon Avatars with Fine-Grained Facial Expressions from Realistic Diffusion-based Faces
- URL: http://arxiv.org/abs/2504.07945v1
- Date: Thu, 10 Apr 2025 17:54:02 GMT
- Title: GenEAva: Generating Cartoon Avatars with Fine-Grained Facial Expressions from Realistic Diffusion-based Faces
- Authors: Hao Yu, Rupayan Mallick, Margrit Betke, Sarah Adel Bargal,
- Abstract summary: We propose a novel framework, GenEAva, for generating high-quality cartoon avatars with fine-grained facial expressions.<n>Our approach fine-tunes a state-of-the-art text-to-image diffusion model to synthesize highly detailed and expressive facial expressions.<n>We introduce the first expressive cartoon avatar dataset, GenEAva 1.0, specifically designed to capture 135 fine-grained facial expressions.
- Score: 15.26953477181137
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cartoon avatars have been widely used in various applications, including social media, online tutoring, and gaming. However, existing cartoon avatar datasets and generation methods struggle to present highly expressive avatars with fine-grained facial expressions and are often inspired from real-world identities, raising privacy concerns. To address these challenges, we propose a novel framework, GenEAva, for generating high-quality cartoon avatars with fine-grained facial expressions. Our approach fine-tunes a state-of-the-art text-to-image diffusion model to synthesize highly detailed and expressive facial expressions. We then incorporate a stylization model that transforms these realistic faces into cartoon avatars while preserving both identity and expression. Leveraging this framework, we introduce the first expressive cartoon avatar dataset, GenEAva 1.0, specifically designed to capture 135 fine-grained facial expressions, featuring 13,230 expressive cartoon avatars with a balanced distribution across genders, racial groups, and age ranges. We demonstrate that our fine-tuned model generates more expressive faces than the state-of-the-art text-to-image diffusion model SDXL. We also verify that the cartoon avatars generated by our framework do not include memorized identities from fine-tuning data. The proposed framework and dataset provide a diverse and expressive benchmark for future research in cartoon avatar generation.
Related papers
- ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions [49.34398022152462]
We propose to couple locally-defined facial expressions with 3D Gaussian splatting to enable creating ultra-high fidelity, expressive and photorealistic 3D head avatars.<n>In particular, we leverage a patch-based geometric 3D face model to extract patch expressions and learn how to translate these into local dynamic skin appearance and motion.<n>We employ color-based densification and progressive training to obtain high-quality results and faster convergence for high resolution 3K training images.
arXiv Detail & Related papers (2025-07-14T17:59:03Z) - SmartAvatar: Text- and Image-Guided Human Avatar Generation with VLM AI Agents [91.26239311240873]
SmartAvatar is a vision-language-agent-driven framework for generating fully rigged, animation-ready 3D human avatars.<n>A key innovation is an autonomous verification loop, where the agent renders draft avatars.<n>The generated avatars are fully rigged and support pose manipulation with consistent identity and appearance.
arXiv Detail & Related papers (2025-06-05T03:49:01Z) - EVA: Expressive Virtual Avatars from Multi-view Videos [51.33851869426057]
We introduce Expressive Virtual Avatars (EVA), an actor-specific, fully controllable, and expressive human avatar framework.<n>EVA achieves high-fidelity, lifelike renderings in real time while enabling independent control of facial expressions, body movements, and hand gestures.<n>This work represents a significant advancement towards fully drivable digital human models.
arXiv Detail & Related papers (2025-05-21T11:22:52Z) - Multimodal Generation of Animatable 3D Human Models with AvatarForge [67.31920821192323]
AvatarForge is a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation.
Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation.
arXiv Detail & Related papers (2025-03-11T08:29:18Z) - GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars [44.8290935585746]
Photo-realistic and controllable 3D avatars are crucial for various applications such as virtual and mixed reality (VR/MR), telepresence, gaming, and film production.
Traditional methods for avatar creation often involve time-consuming scanning and reconstruction processes for each avatar.
We propose a text-conditioned generative model that can generate photo-realistic facial avatars of diverse identities.
arXiv Detail & Related papers (2024-08-24T21:25:22Z) - DEGAS: Detailed Expressions on Full-Body Gaussian Avatars [13.683836322899953]
We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions.<n>We propose to adopt the expression latent space trained solely on 2D portrait images, bridging the gap between 2D talking faces and 3D avatars.
arXiv Detail & Related papers (2024-08-20T06:52:03Z) - AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text [71.09533176800707]
AvatarStudio is a coarse-to-fine generative model that generates explicit textured 3D meshes for animatable human avatars.
By effectively leveraging the synergy between the articulated mesh representation and the DensePose-conditional diffusion model, AvatarStudio can create high-quality avatars.
arXiv Detail & Related papers (2023-11-29T18:59:32Z) - GAIA: Zero-shot Talking Avatar Generation [64.78978434650416]
We introduce GAIA (Generative AI for Avatar), which eliminates the domain priors in talking avatar generation.
GAIA beats previous baseline models in terms of naturalness, diversity, lip-sync quality, and visual quality.
It is general and enables different applications like controllable talking avatar generation and text-instructed avatar generation.
arXiv Detail & Related papers (2023-11-26T08:04:43Z) - AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation [14.062402203105712]
AvatarBooth is a novel method for generating high-quality 3D avatars using text prompts or specific images.
Our key contribution is the precise avatar generation control by using dual fine-tuned diffusion models.
We present a multi-resolution rendering strategy that facilitates coarse-to-fine supervision of 3D avatar generation.
arXiv Detail & Related papers (2023-06-16T14:18:51Z) - DreamWaltz: Make a Scene with Complex 3D Animatable Avatars [68.49935994384047]
We present DreamWaltz, a novel framework for generating and animating complex 3D avatars given text guidance and parametric human body prior.
For animation, our method learns an animatable 3D avatar representation from abundant image priors of diffusion model conditioned on various poses.
arXiv Detail & Related papers (2023-05-21T17:59:39Z) - Text-Conditional Contextualized Avatars For Zero-Shot Personalization [47.85747039373798]
We propose a pipeline that enables personalization of image generation with avatars capturing a user's identity in a delightful way.
Our pipeline is zero-shot, avatar texture and style agnostic, and does not require training on the avatar at all.
We show, for the first time, how to leverage large-scale image datasets to learn human 3D pose parameters.
arXiv Detail & Related papers (2023-04-14T22:00:44Z) - OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane
Rendering [81.55960827071661]
Controllability, generalizability and efficiency are the major objectives of constructing face avatars represented by neural implicit field.
We propose One-shot Talking face Avatar (OTAvatar), which constructs face avatars by a generalized controllable tri-plane rendering solution.
arXiv Detail & Related papers (2023-03-26T09:12:03Z) - 3D Cartoon Face Generation with Controllable Expressions from a Single
GAN Image [142.047662926209]
We generate 3D cartoon face shapes from single 2D GAN generated human faces.
We manipulate latent codes to generate images with different poses and lighting, such that we can reconstruct the 3D cartoon face shapes.
arXiv Detail & Related papers (2022-07-29T01:06:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.