IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation
- URL: http://arxiv.org/abs/2512.23519v1
- Date: Mon, 29 Dec 2025 14:54:44 GMT
- Title: IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation
- Authors: Donghao Zhou, Jingyu Lin, Guibao Shen, Quande Liu, Jialin Gao, Lihao Liu, Lan Du, Cunjian Chen, Chi-Wing Fu, Xiaowei Hu, Pheng-Ann Heng,
- Abstract summary: IdentityStory is a framework for human-centric story generation that ensures consistent character identity across sequential images.<n>By taming identity-preserving generators, the framework features two key components: Iterative Identity Discovery and Re-denoising Identity Injection.
- Score: 75.09818147405898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent visual generative models enable story generation with consistent characters from text, but human-centric story generation faces additional challenges, such as maintaining detailed and diverse human face consistency and coordinating multiple characters across different images. This paper presents IdentityStory, a framework for human-centric story generation that ensures consistent character identity across multiple sequential images. By taming identity-preserving generators, the framework features two key components: Iterative Identity Discovery, which extracts cohesive character identities, and Re-denoising Identity Injection, which re-denoises images to inject identities while preserving desired context. Experiments on the ConsiStory-Human benchmark demonstrate that IdentityStory outperforms existing methods, particularly in face consistency, and supports multi-character combinations. The framework also shows strong potential for applications such as infinite-length story generation and dynamic character composition.
Related papers
- Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection [27.412361280397057]
We introduce Storynizor, a model capable of generating coherent stories with strong inter-frame character consistency.
Key innovation of Storynizor lies in its key modules: ID-Synchronizer and ID-Injector.
To facilitate the training of Storynizor, we have curated a novel dataset called StoryDB comprising 100, 000 images.
arXiv Detail & Related papers (2024-09-29T09:15:51Z) - StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation [10.652011707000202]
We introduce StoryMaker, a personalization solution that preserves not only facial consistency but also clothing, hairstyles, and body consistency.
StoryMaker supports numerous applications and is compatible with other societal plug-ins.
arXiv Detail & Related papers (2024-09-19T08:53:06Z) - CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models [58.37569942713456]
CharacterFactory is a framework that allows sampling new characters with consistent identities in the latent space of GANs.
The whole model only takes 10 minutes for training, and can sample infinite characters end-to-end during inference.
arXiv Detail & Related papers (2024-04-24T06:15:31Z) - Adversarial Identity Injection for Semantic Face Image Synthesis [6.763801424109435]
We present an SIS architecture that exploits a cross-attention mechanism to merge identity, style, and semantic features to generate faces.
Experimental results reveal that the proposed method is not only suitable for preserving the identity but is also effective in the face recognition adversarial attack.
arXiv Detail & Related papers (2024-04-16T09:19:23Z) - FlashFace: Human Image Personalization with High-fidelity Identity Preservation [59.76645602354481]
FlashFace allows users to easily personalize their own photos by providing one or a few reference face images and a text prompt.
Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following.
arXiv Detail & Related papers (2024-03-25T17:59:57Z) - StableIdentity: Inserting Anybody into Anywhere at First Sight [57.99693188913382]
We propose StableIdentity, which allows identity-consistent recontextualization with just one face image.
We are the first to directly inject the identity learned from a single image into video/3D generation without finetuning.
arXiv Detail & Related papers (2024-01-29T09:06:15Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - The Chosen One: Consistent Characters in Text-to-Image Diffusion Models [71.15152184631951]
We propose a fully automated solution for consistent character generation with the sole input being a text prompt.
Our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods.
arXiv Detail & Related papers (2023-11-16T18:59:51Z) - T-Person-GAN: Text-to-Person Image Generation with Identity-Consistency
and Manifold Mix-Up [16.165889084870116]
We present an end-to-end approach to generate high-resolution person images conditioned on texts only.
We develop an effective generative model to produce person images with two novel mechanisms.
arXiv Detail & Related papers (2022-08-18T07:41:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.