Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation
- URL: http://arxiv.org/abs/2407.09779v1
- Date: Sat, 13 Jul 2024 05:28:45 GMT
- Title: Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation
- Authors: Kangyeol Kim, Wooseok Seo, Sehyun Nam, Bodam Kim, Suhyeon Jeong, Wonwoo Cho, Jaegul Choo, Youngjae Yu,
- Abstract summary: We propose a novel P-T2I method called Layout-and-Retouch, consisting of two stages: 1) layout generation and 2) retouch.
In the first stage, our step-blended inference utilizes the inherent sample diversity of vanilla T2I models to produce diversified layout images.
In the second stage, multi-source attention swaps the context image from the first stage with the reference image, leveraging the structure from the context image and extracting visual features from the reference image.
- Score: 40.969861849933444
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Personalized text-to-image (P-T2I) generation aims to create new, text-guided images featuring the personalized subject with a few reference images. However, balancing the trade-off relationship between prompt fidelity and identity preservation remains a critical challenge. To address the issue, we propose a novel P-T2I method called Layout-and-Retouch, consisting of two stages: 1) layout generation and 2) retouch. In the first stage, our step-blended inference utilizes the inherent sample diversity of vanilla T2I models to produce diversified layout images, while also enhancing prompt fidelity. In the second stage, multi-source attention swapping integrates the context image from the first stage with the reference image, leveraging the structure from the context image and extracting visual features from the reference image. This achieves high prompt fidelity while preserving identity characteristics. Through our extensive experiments, we demonstrate that our method generates a wide variety of images with diverse layouts while maintaining the unique identity features of the personalized objects, even with challenging text prompts. This versatility highlights the potential of our framework to handle complex conditions, significantly enhancing the diversity and applicability of personalized image synthesis.
Related papers
- IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait [51.18967854258571]
IC-Portrait is a novel framework designed to accurately encode individual identities for personalized portrait generation.
Our key insight is that pre-trained diffusion models are fast learners for in-context dense correspondence matching.
We show that IC-Portrait consistently outperforms existing state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2025-01-28T18:59:03Z) - Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting [71.29100512700064]
We present T-Prompter, a training-free method for theme-specific image generation.
T-Prompter integrates reference images into generative models, allowing users to seamlessly specify the target theme.
Our approach enables consistent story generation, character design, realistic character generation, and style-guided image generation.
arXiv Detail & Related papers (2025-01-26T19:01:19Z) - Object-level Visual Prompts for Compositional Image Generation [75.6085388740087]
We introduce a method for composing object-level visual prompts within a text-to-image diffusion model.
A key challenge in this task is to preserve the identity of the objects depicted in the input visual prompts.
We introduce a new KV-mixed cross-attention mechanism, in which keys and values are learned from distinct visual representations.
arXiv Detail & Related papers (2025-01-02T18:59:44Z) - Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence.
However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image.
We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z) - MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance [6.4680449907623006]
This research introduces the MS-Diffusion framework for layout-guided zero-shot image personalization with multi-subjects.
The proposed multi-subject cross-attention orchestrates inter-subject compositions while preserving the control of texts.
arXiv Detail & Related papers (2024-06-11T12:32:53Z) - Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm [31.06269858216316]
We propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization.
We introduce an identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information.
We also introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams.
arXiv Detail & Related papers (2024-03-18T13:39:53Z) - Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image
Personalization [56.12990759116612]
Pick-and-Draw is a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods.
The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image.
arXiv Detail & Related papers (2024-01-30T05:56:12Z) - PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion
Models [19.519789922033034]
PhotoVerse is an innovative methodology that incorporates a dual-branch conditioning mechanism in both text and image domains.
After a single training phase, our approach enables generating high-quality images within only a few seconds.
arXiv Detail & Related papers (2023-09-11T19:59:43Z) - TaleCrafter: Interactive Story Visualization with Multiple Characters [49.14122401339003]
This paper proposes a system for generic interactive story visualization.
It is capable of handling multiple novel characters and supporting the editing of layout and local structure.
The system comprises four interconnected components: story-to-prompt generation (S2P), text-to-generation (T2L), controllable text-to-image generation (C-T2I) and image-to-video animation (I2V)
arXiv Detail & Related papers (2023-05-29T17:11:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.