Leveraging Diffusion Models for Stylization using Multiple Style Images
- URL: http://arxiv.org/abs/2508.12784v1
- Date: Mon, 18 Aug 2025 10:00:41 GMT
- Title: Leveraging Diffusion Models for Stylization using Multiple Style Images
- Authors: Dan Ruta, Abdelaziz Djelouah, Raphael Ortiz, Christopher Schroers,
- Abstract summary: We propose leveraging multiple style images which helps better represent style features and prevent content leaking from the style images.<n>We employ clustering to distill a small representative set of attention features from the large number of attention values extracted from the style samples.<n>The resulting method achieves state-of-the-art results for stylization.
- Score: 11.659032530565883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in latent diffusion models have enabled exciting progress in image style transfer. However, several key issues remain. For example, existing methods still struggle to accurately match styles. They are often limited in the number of style images that can be used. Furthermore, they tend to entangle content and style in undesired ways. To address this, we propose leveraging multiple style images which helps better represent style features and prevent content leaking from the style images. We design a method that leverages both image prompt adapters and statistical alignment of the features during the denoising process. With this, our approach is designed such that it can intervene both at the cross-attention and the self-attention layers of the denoising UNet. For the statistical alignment, we employ clustering to distill a small representative set of attention features from the large number of attention values extracted from the style samples. As demonstrated in our experimental section, the resulting method achieves state-of-the-art results for stylization.
Related papers
- Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models [44.4106999443933]
We propose a masking-based method that efficiently decouples content and style from style-reference images.<n>By simply masking specific elements in the style reference's image features, we uncover a critical yet under-explored principle.
arXiv Detail & Related papers (2025-02-11T11:17:39Z) - Z-STAR+: A Zero-shot Style Transfer Method via Adjusting Style Distribution [24.88532732093652]
Style transfer presents a significant challenge, primarily centered on identifying an appropriate style representation.<n>In contrast to existing approaches, we have discovered that latent features in vanilla diffusion models inherently contain natural style and content distributions.<n>Our method adopts dual denoising paths to represent content and style references in latent space, subsequently guiding the content image denoising process with style latent codes.
arXiv Detail & Related papers (2024-11-28T15:56:17Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods [2.468658581089448]
We propose a novel framework called D$2$Styler (Discrete Diffusion Styler)
Our method uses Adaptive Instance Normalization (AdaIN) features as a context guide for the reverse diffusion process.
Experimental results demonstrate that D$2$Styler produces high-quality style-transferred images.
arXiv Detail & Related papers (2024-08-07T05:47:06Z) - Ada-adapter:Fast Few-shot Style Personlization of Diffusion Model with Pre-trained Image Encoder [57.574544285878794]
Ada-Adapter is a novel framework for few-shot style personalization of diffusion models.
Our method enables efficient zero-shot style transfer utilizing a single reference image.
We demonstrate the effectiveness of our approach on various artistic styles, including flat art, 3D rendering, and logo design.
arXiv Detail & Related papers (2024-07-08T02:00:17Z) - InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation [5.364489068722223]
The concept of style is inherently underdetermined, encompassing a multitude of elements such as color, material, atmosphere, design, and structure.
Inversion-based methods are prone to style degradation, often resulting in the loss of fine-grained details.
adapter-based approaches frequently require meticulous weight tuning for each reference image to achieve a balance between style intensity and text controllability.
arXiv Detail & Related papers (2024-04-03T13:34:09Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.