Related papers: StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance

StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance

URL: http://arxiv.org/abs/2510.06827v1
Date: Wed, 08 Oct 2025 09:50:34 GMT
Title: StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance
Authors: Jaeseok Jeong, Junho Kim, Gayoung Lee, Yunjey Choi, Youngjung Uh,
Abstract summary: We propose negative visual query guidance (NVQG) to reduce the transfer of unwanted contents.<n>NVQG employs negative score by intentionally content leakage scenarios that swap queries instead of key and values of self-attention layers from visual style prompts.<n>Our method demonstrates superiority over existing approaches, reflecting the style of the references, and ensuring that resulting images match the text prompts.
Score: 29.94258634899353
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In the domain of text-to-image generation, diffusion models have emerged as powerful tools. Recently, studies on visual prompting, where images are used as prompts, have enabled more precise control over style and content. However, existing methods often suffer from content leakage, where undesired elements of the visual style prompt are transferred along with the intended style. To address this issue, we 1) extend classifier-free guidance (CFG) to utilize swapping self-attention and propose 2) negative visual query guidance (NVQG) to reduce the transfer of unwanted contents. NVQG employs negative score by intentionally simulating content leakage scenarios that swap queries instead of key and values of self-attention layers from visual style prompts. This simple yet effective method significantly reduces content leakage. Furthermore, we provide careful solutions for using a real image as visual style prompts. Through extensive evaluation across various styles and text prompts, our method demonstrates superiority over existing approaches, reflecting the style of the references, and ensuring that resulting images match the text prompts. Our code is available \href{https://github.com/naver-ai/StyleKeeper}{here}.

Related papers

Negative Token Merging: Image-based Adversarial Feature Guidance [114.65069052244088]
We introduce negative token merging (NegToMe) to perform adversarial guidance through images.<n>NegToMe selectively pushes apart matching visual features between reference and generated images during the reverse diffusion process.<n>It significantly enhances output diversity and reduces visual similarity to copyrighted content by 34.57%.
arXiv Detail & Related papers (2024-12-02T10:06:57Z)
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator [44.620847608977776]
Diptych Prompting is a novel zero-shot approach that reinterprets as an inpainting task with precise subject alignment.<n>Our method supports not only subject-driven generation but also stylized image generation and subject-driven image editing.
arXiv Detail & Related papers (2024-11-23T06:17:43Z)
StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models [38.31347002106355]
StyleTokenizer is a zero-shot style control image generation method. It aligns style representation with text representation using a style tokenizer. This alignment effectively minimizes the impact on the effectiveness of text prompts.
arXiv Detail & Related papers (2024-09-04T09:01:21Z)
FAGStyle: Feature Augmentation on Geodesic Surface for Zero-shot Text-guided Diffusion Image Style Transfer [2.3293561091456283]
The goal of image style transfer is to render an image guided by a style reference while maintaining the original content. We introduce FAGStyle, a zero-shot text-guided diffusion image style transfer method. Our approach enhances inter-patch information interaction by incorporating the Sliding Window Crop technique.
arXiv Detail & Related papers (2024-08-20T04:20:11Z)
Visual Style Prompting with Swapping Self-Attention [26.511518230332758]
We propose a novel approach to produce a diverse range of images while maintaining specific style elements and nuances. During the denoising process, we keep the query from original features while swapping the key and value with those from reference features in the late self-attention layers. Our method demonstrates superiority over existing approaches, best reflecting the style of the references and ensuring that resulting images match the text prompts most accurately.
arXiv Detail & Related papers (2024-02-20T12:51:17Z)
StyleAdapter: A Unified Stylized Image Generation Model [97.24936247688824]
StyleAdapter is a unified stylized image generation model capable of producing a variety of stylized images. It can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet.
arXiv Detail & Related papers (2023-09-04T19:16:46Z)
Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences [49.66987347397398]
Few-Shot Stylized Visual Captioning aims to generate captions in any desired style, using only a few examples as guidance during inference. We propose a framework called FS-StyleCap for this task, which utilizes a conditional encoder-decoder language model and a visual projection module.
arXiv Detail & Related papers (2023-07-31T04:26:01Z)
DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results. We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z)
Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning. Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z)
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation. We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt. Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.