Soulstyler: Using Large Language Model to Guide Image Style Transfer for
Target Object
- URL: http://arxiv.org/abs/2311.13562v2
- Date: Wed, 29 Nov 2023 15:24:35 GMT
- Title: Soulstyler: Using Large Language Model to Guide Image Style Transfer for
Target Object
- Authors: Junhao Chen, Peng Rong, Jingbo Sun, Chao Li, Xiang Li, Hongwu Lv
- Abstract summary: "Soulstyler" allows users to guide the stylization of specific objects in an image through simple textual descriptions.
We introduce a large language model to parse the text and identify stylization goals and specific styles.
We also introduce a novel localized text-image block matching loss that ensures that style transfer is performed only on specified target objects.
- Score: 9.759321877363258
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Image style transfer occupies an important place in both computer graphics
and computer vision. However, most current methods require reference to
stylized images and cannot individually stylize specific objects. To overcome
this limitation, we propose the "Soulstyler" framework, which allows users to
guide the stylization of specific objects in an image through simple textual
descriptions. We introduce a large language model to parse the text and
identify stylization goals and specific styles. Combined with a CLIP-based
semantic visual embedding encoder, the model understands and matches text and
image content. We also introduce a novel localized text-image block matching
loss that ensures that style transfer is performed only on specified target
objects, while non-target regions remain in their original style. Experimental
results demonstrate that our model is able to accurately perform style transfer
on target objects according to textual descriptions without affecting the style
of background regions. Our code will be available at
https://github.com/yisuanwang/Soulstyler.
Related papers
- MOSAIC: Multi-Object Segmented Arbitrary Stylization Using CLIP [0.0]
Style transfer driven by text prompts paved a new path for creatively stylizing the images without collecting an actual style image.
We propose a new method Multi-Object Segmented Arbitrary Stylization Using CLIP (MOSAIC) that can apply styles to different objects in the image based on the context extracted from the input prompt.
Our method can extend to any arbitrary objects, styles and produce high-quality images compared to the current state of art methods.
arXiv Detail & Related papers (2023-09-24T18:24:55Z) - Visual Captioning at Will: Describing Images and Videos Guided by a Few
Stylized Sentences [49.66987347397398]
Few-Shot Stylized Visual Captioning aims to generate captions in any desired style, using only a few examples as guidance during inference.
We propose a framework called FS-StyleCap for this task, which utilizes a conditional encoder-decoder language model and a visual projection module.
arXiv Detail & Related papers (2023-07-31T04:26:01Z) - Sem-CS: Semantic CLIPStyler for Text-Based Image Style Transfer [4.588028371034406]
We propose Semantic CLIPStyler (Sem-CS) that performs semantic style transfer.
Sem-CS first segments the content image into salient and non-salient objects and then transfers artistic style based on a given style text description.
Our empirical results, including DISTS, NIMA and user study scores, show that our proposed framework yields superior qualitative and quantitative performance.
arXiv Detail & Related papers (2023-07-12T05:59:42Z) - Any-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate [58.83278629019384]
Style transfer aims to render the style of a given image for style reference to another given image for content reference.
Existing approaches either apply the holistic style of the style image in a global manner, or migrate local colors and textures of the style image to the content counterparts in a pre-defined way.
We propose Any-to-Any Style Transfer, which enables users to interactively select styles of regions in the style image and apply them to the prescribed content regions.
arXiv Detail & Related papers (2023-04-19T15:15:36Z) - StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized
Tokenizer of a Large-Scale Generative Model [64.26721402514957]
We propose StylerDALLE, a style transfer method that uses natural language to describe abstract art styles.
Specifically, we formulate the language-guided style transfer task as a non-autoregressive token sequence translation.
To incorporate style information, we propose a Reinforcement Learning strategy with CLIP-based language supervision.
arXiv Detail & Related papers (2023-03-16T12:44:44Z) - SEM-CS: Semantic CLIPStyler for Text-Based Image Style Transfer [4.588028371034406]
We propose Semantic CLIPStyler (Sem-CS) that performs semantic style transfer.
Sem-CS first segments the content image into salient and non-salient objects and then transfers artistic style based on a given style text description.
Our empirical results, including DISTS, NIMA and user study scores, show that our proposed framework yields superior qualitative and quantitative performance.
arXiv Detail & Related papers (2023-03-11T07:33:06Z) - SHUNIT: Style Harmonization for Unpaired Image-to-Image Translation [14.485088590863327]
We present Style Harmonization for unpaired I2I translation (SHUNIT)
Our SHUNIT generates a new style by harmonizing the target domain style retrieved from a class memory and an original source image style.
We validate our method with extensive experiments and achieve state-of-the-art performance on the latest benchmark sets.
arXiv Detail & Related papers (2023-01-11T19:24:03Z) - DSI2I: Dense Style for Unpaired Image-to-Image Translation [70.93865212275412]
Unpaired exemplar-based image-to-image (UEI2I) translation aims to translate a source image to a target image domain with the style of a target image exemplar.
We propose to represent style as a dense feature map, allowing for a finer-grained transfer to the source image without requiring any external semantic information.
Our results show that the translations produced by our approach are more diverse, preserve the source content better, and are closer to the exemplars when compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-12-26T18:45:25Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - Language-Driven Image Style Transfer [72.36790598245096]
We introduce a new task -- language-driven image style transfer (textttLDIST) -- to manipulate the style of a content image, guided by a text.
The discriminator considers the correlation between language and patches of style images or transferred results to jointly embed style instructions.
Experiments show that our CLVA is effective and achieves superb transferred results on textttLDIST.
arXiv Detail & Related papers (2021-06-01T01:58:50Z) - DeepObjStyle: Deep Object-based Photo Style Transfer [31.75300124593133]
One of the major challenges of style transfer is the appropriate image features supervision between the output image and the input (style and content) images.
We propose an object-based style transfer approach, called DeepStyle, for the style supervision in the training data-independent framework.
arXiv Detail & Related papers (2020-12-11T17:02:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.