CleanStyle: Plug-and-Play Style Conditioning Purification for Text-to-Image Stylization
- URL: http://arxiv.org/abs/2602.20721v1
- Date: Tue, 24 Feb 2026 09:33:05 GMT
- Title: CleanStyle: Plug-and-Play Style Conditioning Purification for Text-to-Image Stylization
- Authors: Xiaoman Feng, Mingkun Lei, Yang Wang, Dingwen Fu, Chi Zhang,
- Abstract summary: CleanStyle is a plug-and-play framework that filters out content-related noise from the style embedding without retraining.<n>CleanStyleSVD dynamically suppresses tail components using a time-aware exponential schedule.<n>SS-CFG reuses the tail components to construct style-aware unconditional inputs.
- Score: 5.300721419484575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Style transfer in diffusion models enables controllable visual generation by injecting the style of a reference image. However, recent encoder-based methods, while efficient and tuning-free, often suffer from content leakage, where semantic elements from the style image undesirably appear in the output, impairing prompt fidelity and stylistic consistency. In this work, we introduce CleanStyle, a plug-and-play framework that filters out content-related noise from the style embedding without retraining. Motivated by empirical analysis, we observe that such leakage predominantly stems from the tail components of the style embedding, which are isolated via Singular Value Decomposition (SVD). To address this, we propose CleanStyleSVD (CS-SVD), which dynamically suppresses tail components using a time-aware exponential schedule, providing clean, style-preserving conditional embeddings throughout the denoising process. Furthermore, we present Style-Specific Classifier-Free Guidance (SS-CFG), which reuses the suppressed tail components to construct style-aware unconditional inputs. Unlike conventional methods that use generic negative embeddings (e.g., zero vectors), SS-CFG introduces targeted negative signals that reflect style-specific but prompt-irrelevant visual elements. This enables the model to effectively suppress these distracting patterns during generation, thereby improving prompt fidelity and enhancing the overall visual quality of stylized outputs. Our approach is lightweight, interpretable, and can be seamlessly integrated into existing encoder-based diffusion models without retraining. Extensive experiments demonstrate that CleanStyle substantially reduces content leakage, improves stylization quality and improves prompt alignment across a wide range of style references and prompts.
Related papers
- Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation [18.601789249339014]
Few-shot Chinese font generation aims to synthesize new characters in a target style using only a handful of reference images.<n>Existing approaches achieve only feature-level disentanglement, allowing the generator to re-entangle these features.<n>We propose the Structure-Level Disentangled Diffusion Model (SLD-Font), which receives content and style information from two separate channels.
arXiv Detail & Related papers (2026-02-21T15:41:06Z) - Sissi: Zero-shot Style-guided Image Synthesis via Semantic-style Integration [57.02757226679549]
We introduce a training-free framework that reformulates style-guided synthesis as an in-context learning task.<n>We propose a Dynamic Semantic-Style Integration (DSSI) mechanism that reweights attention between semantic and style visual tokens.<n>Experiments show that our approach achieves high-fidelity stylization with superior semantic-style balance and visual quality.
arXiv Detail & Related papers (2026-01-10T16:01:14Z) - FantasyStyle: Controllable Stylized Distillation for 3D Gaussian Splatting [7.778588010132252]
We introduce textbfFantasyStyle, a 3DGS-based style transfer framework, and the first to rely entirely on diffusion model distillation.<n>We enhance cross-view consistency by applying a 3D filter to multi-view noisy latent, selectively reducing low-frequency components to mitigate stylized prior conflicts.<n>Our method consistently outperforms state-of-the-art approaches, achieving higher stylization quality and visual realism across various scenes and styles.
arXiv Detail & Related papers (2025-08-11T16:11:08Z) - Only-Style: Stylistic Consistency in Image Generation without Content Leakage [21.68241134664501]
Only-Style is a method designed to mitigate content leakage in a semantically coherent manner while preserving stylistic consistency.<n>Only-Style works by localizing content leakage during inference, allowing the adaptive tuning of a parameter that controls the style alignment process.<n>Our approach demonstrates a significant improvement over state-of-the-art methods through extensive evaluation across diverse instances.
arXiv Detail & Related papers (2025-06-11T16:33:09Z) - UniVST: A Unified Framework for Training-free Localized Video Style Transfer [102.52552893495475]
This paper presents UniVST, a unified framework for localized video style transfer based on diffusion models.<n>It operates without the need for training, offering a distinct advantage over existing diffusion methods that transfer style across entire videos.
arXiv Detail & Related papers (2024-10-26T05:28:02Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control [43.96257216397601]
We propose a new plug-and-play solution for training-free personalization of diffusion models.
RB-Modulation is built on a novel optimal controller where a style descriptor encodes the desired attributes.
Cross-attention-based feature aggregation scheme allows RB-Modulation to decouple content and style from the reference image.
arXiv Detail & Related papers (2024-05-27T17:51:08Z) - ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation [5.364489068722223]
The concept of style is inherently underdetermined, encompassing a multitude of elements such as color, material, atmosphere, design, and structure.
Inversion-based methods are prone to style degradation, often resulting in the loss of fine-grained details.
adapter-based approaches frequently require meticulous weight tuning for each reference image to achieve a balance between style intensity and text controllability.
arXiv Detail & Related papers (2024-04-03T13:34:09Z) - HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced
Diffusion Models [84.12784265734238]
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video.
We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues.
A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
arXiv Detail & Related papers (2024-01-11T12:26:23Z) - StyleAdapter: A Unified Stylized Image Generation Model [97.24936247688824]
StyleAdapter is a unified stylized image generation model capable of producing a variety of stylized images.
It can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet.
arXiv Detail & Related papers (2023-09-04T19:16:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.