Related papers: FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models

FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models

URL: http://arxiv.org/abs/2401.15636v2
Date: Thu, 18 Jul 2024 10:41:29 GMT
Title: FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models
Authors: Feihong He, Gang Li, Mengyuan Zhang, Leilei Yan, Lingyu Si, Fanzhang Li, Li Shen,
Abstract summary: FreeStyle is an innovative style transfer method built upon a pre-trained large diffusion model. Our method enables style transfer only through a text description of the desired style. Compared with state-of-the-art methods that require training, our FreeStyle approach notably reduces the computational burden by thousands of iterations.
Score: 10.612058783373012
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid development of generative diffusion models has significantly advanced the field of style transfer. However, most current style transfer methods based on diffusion models typically involve a slow iterative optimization process, e.g., model fine-tuning and textual inversion of style concept. In this paper, we introduce FreeStyle, an innovative style transfer method built upon a pre-trained large diffusion model, requiring no further optimization. Besides, our method enables style transfer only through a text description of the desired style, eliminating the necessity of style images. Specifically, we propose a dual-stream encoder and single-stream decoder architecture, replacing the conventional U-Net in diffusion models. In the dual-stream encoder, two distinct branches take the content image and style text prompt as inputs, achieving content and style decoupling. In the decoder, we further modulate features from the dual streams based on a given content image and the corresponding style text prompt for precise style transfer. Our experimental results demonstrate high-quality synthesis and fidelity of our method across various content images and style text prompts. Compared with state-of-the-art methods that require training, our FreeStyle approach notably reduces the computational burden by thousands of iterations, while achieving comparable or superior performance across multiple evaluation metrics including CLIP Aesthetic Score, CLIP Score, and Preference. We have released the code anonymously at: \href{https://anonymous.4open.science/r/FreeStyleAnonymous-0F9B}

Related papers

Break Stylistic Sophon: Are We Really Meant to Confine the Imagination in Style Transfer? [12.2238770989173]
StyleWallfacer is a groundbreaking unified training and inference framework.<n>It addresses various issues encountered in the style transfer process of traditional methods.<n>It delivers artist-level style transfer and text-driven stylization.
arXiv Detail & Related papers (2025-06-18T00:24:29Z)
D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods [2.468658581089448]
We propose a novel framework called D$2$Styler (Discrete Diffusion Styler) Our method uses Adaptive Instance Normalization (AdaIN) features as a context guide for the reverse diffusion process. Experimental results demonstrate that D$2$Styler produces high-quality style-transferred images.
arXiv Detail & Related papers (2024-08-07T05:47:06Z)
Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images. By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z)
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter [78.75422651890776]
StyleCrafter is a generic method that enhances pre-trained T2V models with a style control adapter. To promote content-style disentanglement, we remove style descriptions from the text prompt and extract style information solely from the reference image. StyleCrafter efficiently generates high-quality stylized videos that align with the content of the texts and resemble the style of the reference images.
arXiv Detail & Related papers (2023-12-01T03:53:21Z)
A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework. We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature. Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z)
DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results. We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z)
Learning Diverse Tone Styles for Image Retouching [73.60013618215328]
We propose to learn diverse image retouching with normalizing flow-based architectures. A joint-training pipeline is composed of a style encoder, a conditional RetouchNet, and the image tone style normalizing flow (TSFlow) module. Our proposed method performs favorably against state-of-the-art methods and is effective in generating diverse results.
arXiv Detail & Related papers (2022-07-12T09:49:21Z)
Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning. Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.