StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized
Tokenizer of a Large-Scale Generative Model
- URL: http://arxiv.org/abs/2303.09268v2
- Date: Mon, 9 Oct 2023 15:17:25 GMT
- Title: StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized
Tokenizer of a Large-Scale Generative Model
- Authors: Zipeng Xu, Enver Sangineto, Nicu Sebe
- Abstract summary: We propose StylerDALLE, a style transfer method that uses natural language to describe abstract art styles.
Specifically, we formulate the language-guided style transfer task as a non-autoregressive token sequence translation.
To incorporate style information, we propose a Reinforcement Learning strategy with CLIP-based language supervision.
- Score: 64.26721402514957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the progress made in the style transfer task, most previous work
focus on transferring only relatively simple features like color or texture,
while missing more abstract concepts such as overall art expression or
painter-specific traits. However, these abstract semantics can be captured by
models like DALL-E or CLIP, which have been trained using huge datasets of
images and textual documents. In this paper, we propose StylerDALLE, a style
transfer method that exploits both of these models and uses natural language to
describe abstract art styles. Specifically, we formulate the language-guided
style transfer task as a non-autoregressive token sequence translation, i.e.,
from input content image to output stylized image, in the discrete latent space
of a large-scale pretrained vector-quantized tokenizer, e.g., the discrete
variational auto-encoder (dVAE) of DALL-E. To incorporate style information, we
propose a Reinforcement Learning strategy with CLIP-based language supervision
that ensures stylization and content preservation simultaneously. Experimental
results demonstrate the superiority of our method, which can effectively
transfer art styles using language instructions at different granularities.
Code is available at https://github.com/zipengxuc/StylerDALLE.
Related papers
- Visual Captioning at Will: Describing Images and Videos Guided by a Few
Stylized Sentences [49.66987347397398]
Few-Shot Stylized Visual Captioning aims to generate captions in any desired style, using only a few examples as guidance during inference.
We propose a framework called FS-StyleCap for this task, which utilizes a conditional encoder-decoder language model and a visual projection module.
arXiv Detail & Related papers (2023-07-31T04:26:01Z) - ALADIN-NST: Self-supervised disentangled representation learning of
artistic style through Neural Style Transfer [60.6863849241972]
We learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image.
We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics.
arXiv Detail & Related papers (2023-04-12T10:33:18Z) - StoryTrans: Non-Parallel Story Author-Style Transfer with Discourse
Representations and Content Enhancing [73.81778485157234]
Long texts usually involve more complicated author linguistic preferences such as discourse structures than sentences.
We formulate the task of non-parallel story author-style transfer, which requires transferring an input story into a specified author style.
We use an additional training objective to disentangle stylistic features from the learned discourse representation to prevent the model from degenerating to an auto-encoder.
arXiv Detail & Related papers (2022-08-29T08:47:49Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - StyleBabel: Artistic Style Tagging and Captioning [38.792350870518504]
We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks.
arXiv Detail & Related papers (2022-03-10T12:15:55Z) - Language-Driven Image Style Transfer [72.36790598245096]
We introduce a new task -- language-driven image style transfer (textttLDIST) -- to manipulate the style of a content image, guided by a text.
The discriminator considers the correlation between language and patches of style images or transferred results to jointly embed style instructions.
Experiments show that our CLVA is effective and achieves superb transferred results on textttLDIST.
arXiv Detail & Related papers (2021-06-01T01:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.