Related papers: Visual Style Prompting with Swapping Self-Attention

Visual Style Prompting with Swapping Self-Attention

URL: http://arxiv.org/abs/2402.12974v2
Date: Wed, 21 Feb 2024 14:04:30 GMT
Title: Visual Style Prompting with Swapping Self-Attention
Authors: Jaeseok Jeong, Junho Kim, Yunjey Choi, Gayoung Lee, Youngjung Uh
Abstract summary: We propose a novel approach to produce a diverse range of images while maintaining specific style elements and nuances. During the denoising process, we keep the query from original features while swapping the key and value with those from reference features in the late self-attention layers. Our method demonstrates superiority over existing approaches, best reflecting the style of the references and ensuring that resulting images match the text prompts most accurately.
Score: 26.511518230332758
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In the evolving domain of text-to-image generation, diffusion models have emerged as powerful tools in content creation. Despite their remarkable capability, existing models still face challenges in achieving controlled generation with a consistent style, requiring costly fine-tuning or often inadequately transferring the visual elements due to content leakage. To address these challenges, we propose a novel approach, \ours, to produce a diverse range of images while maintaining specific style elements and nuances. During the denoising process, we keep the query from original features while swapping the key and value with those from reference features in the late self-attention layers. This approach allows for the visual style prompting without any fine-tuning, ensuring that generated images maintain a faithful style. Through extensive evaluation across various styles and text prompts, our method demonstrates superiority over existing approaches, best reflecting the style of the references and ensuring that resulting images match the text prompts most accurately. Our project page is available https://curryjung.github.io/VisualStylePrompt/.

Related papers

Autoregressive Styled Text Image Generation, but Make it Reliable [51.09340470015673]
This work is dedicated to developing strategies that reproduce the characteristics of a given writer, with promising results in terms of style fidelity and generalization achieved by the recently proposed Autoregressive Transformer paradigm for HTG.<n>In this work, we rethink the autoregressive by framing HTG as a multimodal prompt-conditioned generation task, tackling the content controllability issues by introducing special input tokens for better alignment with the visual ones.
arXiv Detail & Related papers (2025-10-27T11:54:23Z)
StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance [29.94258634899353]
We propose negative visual query guidance (NVQG) to reduce the transfer of unwanted contents.<n>NVQG employs negative score by intentionally content leakage scenarios that swap queries instead of key and values of self-attention layers from visual style prompts.<n>Our method demonstrates superiority over existing approaches, reflecting the style of the references, and ensuring that resulting images match the text prompts.
arXiv Detail & Related papers (2025-10-08T09:50:34Z)
LLM-Enabled Style and Content Regularization for Personalized Text-to-Image Generation [14.508665960053643]
The personalized text-to-image generation has rapidly advanced with the emergence of Stable Diffusion. Existing methods, which typically fine-tune models using embedded identifiers, often struggle with insufficient stylization and inaccurate image content. We propose style refinement and content preservation strategies.
arXiv Detail & Related papers (2025-04-19T04:08:42Z)
Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting [71.29100512700064]
We present T-Prompter, a training-free method for theme-specific image generation. T-Prompter integrates reference images into generative models, allowing users to seamlessly specify the target theme. Our approach enables consistent story generation, character design, realistic character generation, and style-guided image generation.
arXiv Detail & Related papers (2025-01-26T19:01:19Z)
InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation [4.1177497612346]
Style transfer is an inventive process designed to create an image that maintains the essence of the original while embracing the visual style of another. We introduce InstantStyle-Plus, an approach that prioritizes the integrity of the original content while seamlessly integrating the target style.
arXiv Detail & Related papers (2024-06-30T18:05:33Z)
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation [5.364489068722223]
The concept of style is inherently underdetermined, encompassing a multitude of elements such as color, material, atmosphere, design, and structure. Inversion-based methods are prone to style degradation, often resulting in the loss of fine-grained details. adapter-based approaches frequently require meticulous weight tuning for each reference image to achieve a balance between style intensity and text controllability.
arXiv Detail & Related papers (2024-04-03T13:34:09Z)
Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images. By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z)
ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation. We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network. Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z)
StyleAdapter: A Unified Stylized Image Generation Model [97.24936247688824]
StyleAdapter is a unified stylized image generation model capable of producing a variety of stylized images. It can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet.
arXiv Detail & Related papers (2023-09-04T19:16:46Z)
Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences [49.66987347397398]
Few-Shot Stylized Visual Captioning aims to generate captions in any desired style, using only a few examples as guidance during inference. We propose a framework called FS-StyleCap for this task, which utilizes a conditional encoder-decoder language model and a visual projection module.
arXiv Detail & Related papers (2023-07-31T04:26:01Z)
ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models [77.03361270726944]
Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. We propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information. We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout.
arXiv Detail & Related papers (2023-05-25T16:32:01Z)
DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results. We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z)
Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image Generation [13.894251782142584]
Diffusion-based text-to-image generation models like GLIDE and DALLE-2 have gained wide success recently. We propose a novel style guidance method to support generating images using arbitrary style guided by a reference image.
arXiv Detail & Related papers (2022-11-14T20:52:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.