StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
- URL: http://arxiv.org/abs/2409.12576v1
- Date: Thu, 19 Sep 2024 08:53:06 GMT
- Title: StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
- Authors: Zhengguang Zhou, Jing Li, Huaxia Li, Nemo Chen, Xu Tang,
- Abstract summary: We introduce StoryMaker, a personalization solution that preserves not only facial consistency but also clothing, hairstyles, and body consistency.
StoryMaker supports numerous applications and is compatible with other societal plug-ins.
- Score: 10.652011707000202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tuning-free personalized image generation methods have achieved significant success in maintaining facial consistency, i.e., identities, even with multiple characters. However, the lack of holistic consistency in scenes with multiple characters hampers these methods' ability to create a cohesive narrative. In this paper, we introduce StoryMaker, a personalization solution that preserves not only facial consistency but also clothing, hairstyles, and body consistency, thus facilitating the creation of a story through a series of images. StoryMaker incorporates conditions based on face identities and cropped character images, which include clothing, hairstyles, and bodies. Specifically, we integrate the facial identity information with the cropped character images using the Positional-aware Perceiver Resampler (PPR) to obtain distinct character features. To prevent intermingling of multiple characters and the background, we separately constrain the cross-attention impact regions of different characters and the background using MSE loss with segmentation masks. Additionally, we train the generation network conditioned on poses to promote decoupling from poses. A LoRA is also employed to enhance fidelity and quality. Experiments underscore the effectiveness of our approach. StoryMaker supports numerous applications and is compatible with other societal plug-ins. Our source codes and model weights are available at https://github.com/RedAIGC/StoryMaker.
Related papers
- Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection [27.412361280397057]
We introduce Storynizor, a model capable of generating coherent stories with strong inter-frame character consistency.
Key innovation of Storynizor lies in its key modules: ID-Synchronizer and ID-Injector.
To facilitate the training of Storynizor, we have curated a novel dataset called StoryDB comprising 100, 000 images.
arXiv Detail & Related papers (2024-09-29T09:15:51Z) - CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models [58.37569942713456]
CharacterFactory is a framework that allows sampling new characters with consistent identities in the latent space of GANs.
The whole model only takes 10 minutes for training, and can sample infinite characters end-to-end during inference.
arXiv Detail & Related papers (2024-04-24T06:15:31Z) - FlashFace: Human Image Personalization with High-fidelity Identity Preservation [59.76645602354481]
FlashFace allows users to easily personalize their own photos by providing one or a few reference face images and a text prompt.
Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following.
arXiv Detail & Related papers (2024-03-25T17:59:57Z) - StableIdentity: Inserting Anybody into Anywhere at First Sight [57.99693188913382]
We propose StableIdentity, which allows identity-consistent recontextualization with just one face image.
We are the first to directly inject the identity learned from a single image into video/3D generation without finetuning.
arXiv Detail & Related papers (2024-01-29T09:06:15Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - StyO: Stylize Your Face in Only One-Shot [8.253458555695767]
This paper focuses on face stylization with a single artistic target.
Existing works for this task often fail to retain the source content while achieving geometry variation.
We present a novel StyO model, ie. Stylize the face in only One-shot, to solve the above problem.
arXiv Detail & Related papers (2023-03-06T15:48:33Z) - Make-A-Story: Visual Memory Conditioned Consistent Story Generation [57.691064030235985]
We propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context.
Our method outperforms prior state-of-the-art in generating frames with high visual quality.
Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, but also models appropriate correspondences between the characters and the background.
arXiv Detail & Related papers (2022-11-23T21:38:51Z) - StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face
Reenactment [47.27033282706179]
We propose a framework that learns to disentangle the identity characteristics of the face from its pose.
We show that the proposed method produces higher quality results even on extreme pose variations.
arXiv Detail & Related papers (2022-09-27T13:22:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.