Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization
- URL: http://arxiv.org/abs/2406.16537v4
- Date: Sun, 29 Sep 2024 09:07:23 GMT
- Title: Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization
- Authors: Yuhang Ma, Wenting Xu, Jiji Tang, Qinfeng Jin, Rongsheng Zhang, Zeng Zhao, Changjie Fan, Zhipeng Hu,
- Abstract summary: Character-Adapter is a plug-and-play framework designed to generate images that preserve the details of reference characters.
Character-Adapter employs prompt-guided segmentation to ensure fine-grained regional features of reference characters.
- Score: 34.28477193804092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Customized image generation, which seeks to synthesize images with consistent characters, holds significant relevance for applications such as storytelling, portrait generation, and character design. However, previous approaches have encountered challenges in preserving characters with high-fidelity consistency due to inadequate feature extraction and concept confusion of reference characters. Therefore, we propose Character-Adapter, a plug-and-play framework designed to generate images that preserve the details of reference characters, ensuring high-fidelity consistency. Character-Adapter employs prompt-guided segmentation to ensure fine-grained regional features of reference characters and dynamic region-level adapters to mitigate concept confusion. Extensive experiments are conducted to validate the effectiveness of Character-Adapter. Both quantitative and qualitative results demonstrate that Character-Adapter achieves the state-of-the-art performance of consistent character generation, with an improvement of 24.8% compared with other methods. Our code will be released at https://github.com/Character-Adapter/Character-Adapter.
Related papers
- Retrieval Augmented Comic Image Generation [2.8594383542895385]
We present RaCig, a novel system for generating comic-style image sequences with consistent characters and expressive gestures.<n>RaCig addresses two key challenges: maintaining character identity and costume consistency across frames, and producing diverse and vivid character gestures.
arXiv Detail & Related papers (2025-06-14T14:18:47Z) - InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework [24.29397138274732]
InstantCharacter is a scalable framework for character customization built upon a foundation diffusion transformer.
It achieves open-domain personalization across diverse character appearances, poses, and styles while maintaining high-fidelity results.
arXiv Detail & Related papers (2025-04-16T18:01:59Z) - CharacterBench: Benchmarking Character Customization of Large Language Models [80.29164862682063]
We propose CharacterBench, the largest bilingual generative benchmark, with 22,859 human-annotated samples covering 3,956 characters.<n>We define 11 dimensions of 6 aspects, classified as sparse and dense dimensions based on whether character features evaluated by specific dimensions manifest in each response.<n>We also develop CharacterJudge model for cost-effective and stable evaluations.
arXiv Detail & Related papers (2024-12-16T15:55:34Z) - StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization [36.14275850149665]
We propose a novel knowledge graph, namely Character Graph (textbfCG), which comprehensively represents various story-related knowledge.
We then introduce StoryWeaver, an image generator that achieve Customization via Character Graph (textbfC-CG), capable of consistent story visualization with rich text semantics.
arXiv Detail & Related papers (2024-12-10T10:16:50Z) - Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models [79.21968152209193]
We introduce the NewEpisode benchmark to evaluate generative models' adaptability in generating new stories with fresh characters.
We propose EpicEvo, a method that customizes a diffusion-based visual story generation model with a single story featuring the new characters seamlessly integrating them into established character dynamics.
arXiv Detail & Related papers (2024-05-20T07:54:03Z) - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models [31.762112403595612]
IDAdapter is a tuning-free approach that enhances the diversity and identity preservation in personalized image generation from a single face image.
During the training phase, we incorporate mixed features from multiple reference images of a specific identity to enrich identity-related content details.
arXiv Detail & Related papers (2024-03-20T12:13:04Z) - Masked Generative Story Transformer with Character Guidance and Caption
Augmentation [2.1392064955842023]
Story visualization is a challenging generative vision task, that requires both visual quality and consistency between different frames in generated image sequences.
Previous approaches either employ some kind of memory mechanism to maintain context throughout an auto-regressive generation of the image sequence, or model the generation of the characters and their background separately.
We propose a completely parallel transformer-based approach, relying on Cross-Attention with past and future captions to achieve consistency.
arXiv Detail & Related papers (2024-03-13T13:10:20Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - Identity-Aware Semi-Supervised Learning for Comic Character
Re-Identification [2.4624325014867763]
We introduce a robust framework that combines metric learning with a novel 'Identity-Aware' self-supervision method.
Our approach involves processing both facial and bodily features within a unified network architecture.
By extensively validating our method using in-series and inter-series evaluation metrics, we demonstrate its effectiveness in consistently re-identifying comic characters.
arXiv Detail & Related papers (2023-08-17T16:48:41Z) - Character-Centric Story Visualization via Visual Planning and Token
Alignment [53.44760407148918]
Story visualization advances the traditional text-to-image generation by enabling multiple image generation based on a complete story.
Key challenge of consistent story visualization is to preserve characters that are essential in stories.
We propose to adapt a recent work that augments Vector-Quantized Variational Autoencoders with a text-tovisual-token architecture.
arXiv Detail & Related papers (2022-10-16T06:50:39Z) - Toward Understanding WordArt: Corner-Guided Transformer for Scene Text
Recognition [63.6608759501803]
We propose to recognize artistic text at three levels.
corner points are applied to guide the extraction of local features inside characters, considering the robustness of corner structures to appearance and shape.
Secondly, we design a character contrastive loss to model the character-level feature, improving the feature representation for character classification.
Thirdly, we utilize Transformer to learn the global feature on image-level and model the global relationship of the corner points.
arXiv Detail & Related papers (2022-07-31T14:11:05Z) - CharFormer: A Glyph Fusion based Attentive Framework for High-precision
Character Image Denoising [10.53596428004378]
We introduce a novel framework based on glyph fusion and attention mechanisms, i.e., CharFormer, for precisely recovering character images.
Unlike existing frameworks, CharFormer introduces a parallel target task for capturing additional information and injecting it into the image denoising backbone.
We utilize attention-based networks for global-local feature interaction, which will help to deal with blind denoising and enhance denoising performance.
arXiv Detail & Related papers (2022-07-16T01:11:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.