DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization
- URL: http://arxiv.org/abs/2211.10682v2
- Date: Mon, 18 Dec 2023 05:08:20 GMT
- Title: DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization
- Authors: Nisha Huang, Yuxin Zhang, Fan Tang, Chongyang Ma, Haibin Huang, Yong
Zhang, Weiming Dong, Changsheng Xu
- Abstract summary: DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
- Score: 66.42741426640633
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite the impressive results of arbitrary image-guided style transfer
methods, text-driven image stylization has recently been proposed for
transferring a natural image into a stylized one according to textual
descriptions of the target style provided by the user. Unlike the previous
image-to-image transfer approaches, text-guided stylization progress provides
users with a more precise and intuitive way to express the desired style.
However, the huge discrepancy between cross-modal inputs/outputs makes it
challenging to conduct text-driven image stylization in a typical feed-forward
CNN pipeline. In this paper, we present DiffStyler, a dual diffusion processing
architecture to control the balance between the content and style of the
diffused results. The cross-modal style information can be easily integrated as
guidance during the diffusion process step-by-step. Furthermore, we propose a
content image-based learnable noise on which the reverse denoising process is
based, enabling the stylization results to better preserve the structure
information of the content image. We validate the proposed DiffStyler beyond
the baseline methods through extensive qualitative and quantitative
experiments. Code is available at
\url{https://github.com/haha-lisa/Diffstyler}.
Related papers
- D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods [2.468658581089448]
We propose a novel framework called D$2$Styler (Discrete Diffusion Styler)
Our method uses Adaptive Instance Normalization (AdaIN) features as a context guide for the reverse diffusion process.
Experimental results demonstrate that D$2$Styler produces high-quality style-transferred images.
arXiv Detail & Related papers (2024-08-07T05:47:06Z) - Artist: Aesthetically Controllable Text-Driven Stylization without Training [19.5597806965592]
We introduce textbfArtist, a training-free approach that aesthetically controls the content and style generation of a pretrained diffusion model for text-driven stylization.
Our key insight is to disentangle the denoising of content and style into separate diffusion processes while sharing information between them.
Our method excels at achieving aesthetic-level stylization requirements, preserving intricate details in the content image and aligning well with the style prompt.
arXiv Detail & Related papers (2024-07-22T17:58:05Z) - StyleMamba : State Space Model for Efficient Text-driven Image Style Transfer [9.010012117838725]
StyleMamba is an efficient image style transfer framework that translates text prompts into corresponding visual styles.
Existing text-guided stylization requires hundreds of training iterations and takes a lot of computing resources.
arXiv Detail & Related papers (2024-05-08T12:57:53Z) - FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models [11.401299303276016]
We introduce FreeStyle, an innovative style transfer method built upon a pre-trained large diffusion model.
Our method enables style transfer only through a text description of the desired style, eliminating the necessity of style images.
Our experimental results demonstrate high-quality synthesis and fidelity of our method across various content images and style text prompts.
arXiv Detail & Related papers (2024-01-28T12:00:31Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - ControlStyle: Text-Driven Stylized Image Generation Using Diffusion
Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation.
We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network.
Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - Language-Driven Image Style Transfer [72.36790598245096]
We introduce a new task -- language-driven image style transfer (textttLDIST) -- to manipulate the style of a content image, guided by a text.
The discriminator considers the correlation between language and patches of style images or transferred results to jointly embed style instructions.
Experiments show that our CLVA is effective and achieves superb transferred results on textttLDIST.
arXiv Detail & Related papers (2021-06-01T01:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.