Character-Centric Understanding of Animated Movies
- URL: http://arxiv.org/abs/2509.12204v1
- Date: Mon, 15 Sep 2025 17:59:51 GMT
- Title: Character-Centric Understanding of Animated Movies
- Authors: Zhongrui Gui, Junyu Xie, Tengda Han, Weidi Xie, Andrew Zisserman,
- Abstract summary: We propose an audio-visual pipeline to enable automatic and robust animated character recognition.<n>Characters in animated movies exhibit extreme diversity in their appearance, motion, and deformation.<n>This pipeline enhances character-centric understanding of animated movies.
- Score: 88.83104906869106
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Animated movies are captivating for their unique character designs and imaginative storytelling, yet they pose significant challenges for existing recognition systems. Unlike the consistent visual patterns detected by conventional face recognition methods, animated characters exhibit extreme diversity in their appearance, motion, and deformation. In this work, we propose an audio-visual pipeline to enable automatic and robust animated character recognition, and thereby enhance character-centric understanding of animated movies. Central to our approach is the automatic construction of an audio-visual character bank from online sources. This bank contains both visual exemplars and voice (audio) samples for each character, enabling subsequent multi-modal character recognition despite long-tailed appearance distributions. Building on accurate character recognition, we explore two downstream applications: Audio Description (AD) generation for visually impaired audiences, and character-aware subtitling for the hearing impaired. To support research in this domain, we introduce CMD-AM, a new dataset of 75 animated movies with comprehensive annotations. Our character-centric pipeline demonstrates significant improvements in both accessibility and narrative comprehension for animated content over prior face-detection-based approaches. For the code and dataset, visit https://www.robots.ox.ac.uk/~vgg/research/animated_ad/.
Related papers
- Wan-Animate: Unified Character Animation and Replacement with Holistic Replication [53.619006977292635]
We introduce Wan-Animate, a unified framework for character animation and replacement.<n>It can animate the character by precisely replicating the expressions and movements of the character in the video to generate high-fidelity character videos.<n>It can integrate the animated character into the reference video to replace the original character, replicating the scene's lighting and color tone.
arXiv Detail & Related papers (2025-09-17T15:00:57Z) - FairyGen: Storied Cartoon Video from a Single Child-Drawn Character [15.701180508477679]
We propose FairyGen, an automatic system for generating story-driven cartoon videos from a single child's drawing.<n>Unlike previous storytelling methods, FairyGen explicitly disentangles character modeling from stylized background generation.<n>Our system produces animations that are stylistically faithful, narratively structured natural motion.
arXiv Detail & Related papers (2025-06-26T13:58:16Z) - AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation [52.655400705690155]
AnimeShooter is a reference-guided multi-shot animation dataset.<n>Story-level annotations provide an overview of the narrative, including the storyline, key scenes, and main character profiles with reference images.<n>Shot-level annotations decompose the story into consecutive shots, each annotated with scene, characters, and both narrative and descriptive visual captions.<n>A separate subset, AnimeShooter-audio, offers synchronized audio tracks for each shot, along with audio descriptions and sound sources.
arXiv Detail & Related papers (2025-06-03T17:55:18Z) - HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters [14.594698765723756]
HunyuanVideo-Avatar is a model capable of simultaneously generating dynamic, emotion-controllable, and multi-character dialogue videos.<n>A character image injection module is designed to replace the conventional addition-based character conditioning scheme.<n>An Audio Emotion Module (AEM) is introduced to extract and transfer the emotional cues from an emotion reference image to the target generated video.<n>A Face-Aware Audio Adapter (FAA) is proposed to isolate the audio-driven character with latent-level face mask.
arXiv Detail & Related papers (2025-05-26T15:57:27Z) - MoCha: Towards Movie-Grade Talking Character Synthesis [62.007000023747445]
We introduce Talking Characters, a more realistic task to generate talking character animations directly from speech and text.<n>Unlike talking head, Talking Characters aims at generating the full portrait of one or more characters beyond the facial region.<n>We propose MoCha, the first of its kind to generate talking characters.
arXiv Detail & Related papers (2025-03-30T04:22:09Z) - Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling [77.08568533331206]
We propose a novel multi-condition guided framework for character image animation.<n>We employ several well-designed input modules to enhance the implicit decoupling capability of the model.<n>Our method excels in generating high-quality character animations, especially in scenarios of complex backgrounds and multiple characters.
arXiv Detail & Related papers (2024-06-05T08:03:18Z) - AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding [24.486705010561067]
The paper introduces AniTalker, a framework designed to generate lifelike talking faces from a single portrait.
AniTalker effectively captures a wide range of facial dynamics, including subtle expressions and head movements.
arXiv Detail & Related papers (2024-05-06T02:32:41Z) - Dynamic Typography: Bringing Text to Life via Video Diffusion Prior [73.72522617586593]
We present an automated text animation scheme, termed "Dynamic Typography"
It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts.
Our technique harnesses vector graphics representations and an end-to-end optimization-based framework.
arXiv Detail & Related papers (2024-04-17T17:59:55Z) - Learning Audio-Driven Viseme Dynamics for 3D Face Animation [17.626644507523963]
We present a novel audio-driven facial animation approach that can generate realistic lip-synchronized 3D animations from the input audio.
Our approach learns viseme dynamics from speech videos, produces animator-friendly viseme curves, and supports multilingual speech inputs.
arXiv Detail & Related papers (2023-01-15T09:55:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.