StyleMamba : State Space Model for Efficient Text-driven Image Style Transfer
- URL: http://arxiv.org/abs/2405.05027v1
- Date: Wed, 8 May 2024 12:57:53 GMT
- Title: StyleMamba : State Space Model for Efficient Text-driven Image Style Transfer
- Authors: Zijia Wang, Zhi-Song Liu,
- Abstract summary: StyleMamba is an efficient image style transfer framework that translates text prompts into corresponding visual styles.
Existing text-guided stylization requires hundreds of training iterations and takes a lot of computing resources.
- Score: 9.010012117838725
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present StyleMamba, an efficient image style transfer framework that translates text prompts into corresponding visual styles while preserving the content integrity of the original images. Existing text-guided stylization requires hundreds of training iterations and takes a lot of computing resources. To speed up the process, we propose a conditional State Space Model for Efficient Text-driven Image Style Transfer, dubbed StyleMamba, that sequentially aligns the image features to the target text prompts. To enhance the local and global style consistency between text and image, we propose masked and second-order directional losses to optimize the stylization direction to significantly reduce the training iterations by 5 times and the inference time by 3 times. Extensive experiments and qualitative evaluation confirm the robust and superior stylization performance of our methods compared to the existing baselines.
Related papers
- Dynamic Prompt Optimizing for Text-to-Image Generation [63.775458908172176]
We introduce the textbfPrompt textbfAuto-textbfEditing (PAE) method to improve text-to-image generative models.
We employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts.
arXiv Detail & Related papers (2024-04-05T13:44:39Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - ControlStyle: Text-Driven Stylized Image Generation Using Diffusion
Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation.
We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network.
Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z) - ITstyler: Image-optimized Text-based Style Transfer [25.60521982742093]
We present a text-based style transfer method that does not require optimization at the inference stage.
Specifically, we convert text input to the style space of the pre-trained VGG network to realize a more effective style swap.
Our method can transfer arbitrary new styles of text input in real-time and synthesize high-quality artistic images.
arXiv Detail & Related papers (2023-01-26T03:08:43Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - CLIPstyler: Image Style Transfer with a Single Text Condition [34.24876359759408]
Existing neural style transfer methods require reference style images to transfer texture information of style images to content images.
We propose a new framework that enables a style transfer without' a style image, but only with a text description of the desired style.
arXiv Detail & Related papers (2021-12-01T09:48:53Z) - STALP: Style Transfer with Auxiliary Limited Pairing [36.23393954839379]
We present an approach to example-based stylization of images that uses a single pair of a source image and its stylized counterpart.
We demonstrate how to train an image translation network that can perform real-time semantically meaningful style transfer to a set of target images.
arXiv Detail & Related papers (2021-10-20T11:38:41Z) - Language-Driven Image Style Transfer [72.36790598245096]
We introduce a new task -- language-driven image style transfer (textttLDIST) -- to manipulate the style of a content image, guided by a text.
The discriminator considers the correlation between language and patches of style images or transferred results to jointly embed style instructions.
Experiments show that our CLVA is effective and achieves superb transferred results on textttLDIST.
arXiv Detail & Related papers (2021-06-01T01:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.