MOSAIC: Multi-Object Segmented Arbitrary Stylization Using CLIP
- URL: http://arxiv.org/abs/2309.13716v1
- Date: Sun, 24 Sep 2023 18:24:55 GMT
- Title: MOSAIC: Multi-Object Segmented Arbitrary Stylization Using CLIP
- Authors: Prajwal Ganugula, Y S S S Santosh Kumar, N K Sagar Reddy, Prabhath
Chellingi, Avinash Thakur, Neeraj Kasera, C Shyam Anand
- Abstract summary: Style transfer driven by text prompts paved a new path for creatively stylizing the images without collecting an actual style image.
We propose a new method Multi-Object Segmented Arbitrary Stylization Using CLIP (MOSAIC) that can apply styles to different objects in the image based on the context extracted from the input prompt.
Our method can extend to any arbitrary objects, styles and produce high-quality images compared to the current state of art methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Style transfer driven by text prompts paved a new path for creatively
stylizing the images without collecting an actual style image. Despite having
promising results, with text-driven stylization, the user has no control over
the stylization. If a user wants to create an artistic image, the user requires
fine control over the stylization of various entities individually in the
content image, which is not addressed by the current state-of-the-art
approaches. On the other hand, diffusion style transfer methods also suffer
from the same issue because the regional stylization control over the stylized
output is ineffective. To address this problem, We propose a new method
Multi-Object Segmented Arbitrary Stylization Using CLIP (MOSAIC), that can
apply styles to different objects in the image based on the context extracted
from the input prompt. Text-based segmentation and stylization modules which
are based on vision transformer architecture, were used to segment and stylize
the objects. Our method can extend to any arbitrary objects, styles and produce
high-quality images compared to the current state of art methods. To our
knowledge, this is the first attempt to perform text-guided arbitrary
object-wise stylization. We demonstrate the effectiveness of our approach
through qualitative and quantitative analysis, showing that it can generate
visually appealing stylized images with enhanced control over stylization and
the ability to generalize to unseen object classes.
Related papers
- StyleBrush: Style Extraction and Transfer from a Single Image [19.652575295703485]
Stylization for visual content aims to add specific style patterns at the pixel level while preserving the original structural features.
We propose StyleBrush, a method that accurately captures styles from a reference image and brushes'' the extracted style onto other input visual content.
arXiv Detail & Related papers (2024-08-18T14:27:20Z) - Magic Insert: Style-Aware Drag-and-Drop [28.101564123298882]
We present Magic Insert, a method for dragging-and-dropping subjects from a user-provided image into a target image of a different style.
For style-aware personalization, our method first fine-tunes a pretrained text-to-image diffusion model using LoRA and learned text tokens on the subject image.
For object insertion, we use Bootstrapped Domain Adaption to adapt a domain-specific photorealistic object insertion model to the domain of diverse artistic styles.
arXiv Detail & Related papers (2024-07-02T17:59:50Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - Soulstyler: Using Large Language Model to Guide Image Style Transfer for
Target Object [9.759321877363258]
"Soulstyler" allows users to guide the stylization of specific objects in an image through simple textual descriptions.
We introduce a large language model to parse the text and identify stylization goals and specific styles.
We also introduce a novel localized text-image block matching loss that ensures that style transfer is performed only on specified target objects.
arXiv Detail & Related papers (2023-11-22T18:15:43Z) - Visual Captioning at Will: Describing Images and Videos Guided by a Few
Stylized Sentences [49.66987347397398]
Few-Shot Stylized Visual Captioning aims to generate captions in any desired style, using only a few examples as guidance during inference.
We propose a framework called FS-StyleCap for this task, which utilizes a conditional encoder-decoder language model and a visual projection module.
arXiv Detail & Related papers (2023-07-31T04:26:01Z) - Any-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate [58.83278629019384]
Style transfer aims to render the style of a given image for style reference to another given image for content reference.
Existing approaches either apply the holistic style of the style image in a global manner, or migrate local colors and textures of the style image to the content counterparts in a pre-defined way.
We propose Any-to-Any Style Transfer, which enables users to interactively select styles of regions in the style image and apply them to the prescribed content regions.
arXiv Detail & Related papers (2023-04-19T15:15:36Z) - StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized
Tokenizer of a Large-Scale Generative Model [64.26721402514957]
We propose StylerDALLE, a style transfer method that uses natural language to describe abstract art styles.
Specifically, we formulate the language-guided style transfer task as a non-autoregressive token sequence translation.
To incorporate style information, we propose a Reinforcement Learning strategy with CLIP-based language supervision.
arXiv Detail & Related papers (2023-03-16T12:44:44Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z) - A Fast Text-Driven Approach for Generating Artistic Content [11.295288894403754]
We propose a complete framework that generates visual art.
We implement an improved version that can generate a wide range of results with varying degrees of detail, style and structure.
To further enhance the results, we insert an artistic super-resolution module in the generative pipeline.
arXiv Detail & Related papers (2022-06-22T14:34:59Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Translation [10.357474047610172]
We present an approach for generating styled drawings for a given text description where a user can specify a desired drawing style.
Inspired by a theory in art that style and content are generally inseparable during the creative process, we propose a coupled approach, known here as StyleCLIPDraw.
Based on human evaluation, the styles of images generated by StyleCLIPDraw are strongly preferred to those by the sequential approach.
arXiv Detail & Related papers (2022-02-24T21:03:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.