ITstyler: Image-optimized Text-based Style Transfer
- URL: http://arxiv.org/abs/2301.10916v1
- Date: Thu, 26 Jan 2023 03:08:43 GMT
- Title: ITstyler: Image-optimized Text-based Style Transfer
- Authors: Yunpeng Bai, Jiayue Liu, Chao Dong, Chun Yuan
- Abstract summary: We present a text-based style transfer method that does not require optimization at the inference stage.
Specifically, we convert text input to the style space of the pre-trained VGG network to realize a more effective style swap.
Our method can transfer arbitrary new styles of text input in real-time and synthesize high-quality artistic images.
- Score: 25.60521982742093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-based style transfer is a newly-emerging research topic that uses text
information instead of style image to guide the transfer process, significantly
extending the application scenario of style transfer. However, previous methods
require extra time for optimization or text-image paired data, leading to
limited effectiveness. In this work, we achieve a data-efficient text-based
style transfer method that does not require optimization at the inference
stage. Specifically, we convert text input to the style space of the
pre-trained VGG network to realize a more effective style swap. We also
leverage CLIP's multi-modal embedding space to learn the text-to-style mapping
with the image dataset only. Our method can transfer arbitrary new styles of
text input in real-time and synthesize high-quality artistic images.
Related papers
- Bridging Text and Image for Artist Style Transfer via Contrastive Learning [21.962361974579036]
We propose a Contrastive Learning for Artistic Style Transfer (CLAST) to control arbitrary style transfer.
We introduce a supervised contrastive training strategy to effectively extract style descriptions from the image-text model.
We also propose a novel and efficient adaLN based state space models that explore style-content fusion.
arXiv Detail & Related papers (2024-10-12T15:27:57Z) - Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace [52.24866347353916]
We propose an efficient method to explore the target embedding in a textual subspace.
We also propose an efficient selection strategy for determining the basis of the textual subspace.
Our method opens the door to more efficient representation learning for personalized text-to-image generation.
arXiv Detail & Related papers (2024-06-30T06:41:21Z) - LEAST: "Local" text-conditioned image style transfer [2.47996065798589]
Text-conditioned style transfer enables users to communicate their desired artistic styles through text descriptions.
We evaluate the text-conditioned image editing and style transfer techniques on their fine-grained understanding of user prompts for precise "local" style transfer.
arXiv Detail & Related papers (2024-05-25T19:06:17Z) - StyleMamba : State Space Model for Efficient Text-driven Image Style Transfer [9.010012117838725]
StyleMamba is an efficient image style transfer framework that translates text prompts into corresponding visual styles.
Existing text-guided stylization requires hundreds of training iterations and takes a lot of computing resources.
arXiv Detail & Related papers (2024-05-08T12:57:53Z) - ControlStyle: Text-Driven Stylized Image Generation Using Diffusion
Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation.
We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network.
Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z) - Towards Arbitrary Text-driven Image Manipulation via Space Alignment [49.3370305074319]
We propose a new Text-driven image Manipulation framework via Space Alignment (TMSA)
TMSA aims to align the same semantic regions in CLIP and StyleGAN spaces.
The framework can support arbitrary image editing mode without additional cost.
arXiv Detail & Related papers (2023-01-25T16:20:01Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z) - CLIPstyler: Image Style Transfer with a Single Text Condition [34.24876359759408]
Existing neural style transfer methods require reference style images to transfer texture information of style images to content images.
We propose a new framework that enables a style transfer without' a style image, but only with a text description of the desired style.
arXiv Detail & Related papers (2021-12-01T09:48:53Z) - RewriteNet: Realistic Scene Text Image Generation via Editing Text in
Real-world Image [17.715320405808935]
Scene text editing (STE) is a challenging task due to a complex intervention between text and style.
We propose a novel representational learning-based STE model, referred to as RewriteNet.
Our experiments demonstrate that RewriteNet achieves better quantitative and qualitative performance than other comparisons.
arXiv Detail & Related papers (2021-07-23T06:32:58Z) - Language-Driven Image Style Transfer [72.36790598245096]
We introduce a new task -- language-driven image style transfer (textttLDIST) -- to manipulate the style of a content image, guided by a text.
The discriminator considers the correlation between language and patches of style images or transferred results to jointly embed style instructions.
Experiments show that our CLVA is effective and achieves superb transferred results on textttLDIST.
arXiv Detail & Related papers (2021-06-01T01:58:50Z) - StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation.
We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt.
Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.