Related papers: StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation

StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation

URL: http://arxiv.org/abs/2309.01770v1
Date: Mon, 4 Sep 2023 19:16:46 GMT
Title: StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation
Authors: Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, and Ping Luo
Abstract summary: This paper presents a LoRA-free method for stylized image generation that takes a text prompt and style reference images as inputs. StyleAdapter can generate high-quality images that match the content of the prompts and adopt the style of the references in a single pass.
Score: 97.24936247688824
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a LoRA-free method for stylized image generation that takes a text prompt and style reference images as inputs and produces an output image in a single pass. Unlike existing methods that rely on training a separate LoRA for each style, our method can adapt to various styles with a unified model. However, this poses two challenges: 1) the prompt loses controllability over the generated content, and 2) the output image inherits both the semantic and style features of the style reference image, compromising its content fidelity. To address these challenges, we introduce StyleAdapter, a model that comprises two components: a two-path cross-attention module (TPCA) and three decoupling strategies. These components enable our model to process the prompt and style reference features separately and reduce the strong coupling between the semantic and style information in the style references. StyleAdapter can generate high-quality images that match the content of the prompts and adopt the style of the references (even for unseen styles) in a single pass, which is more flexible and efficient than previous methods. Experiments have been conducted to demonstrate the superiority of our method over previous works.

Related papers

WikiStyle+: A Multimodal Approach to Content-Style Representation Disentanglement for Artistic Image Stylization [0.0]
Artistic image stylization aims to render the content provided by text or image with the target style. Current methods for content and style disentanglement rely on image supervision. This paper proposes a multimodal approach to content-style disentanglement for artistic image stylization.
arXiv Detail & Related papers (2024-12-19T03:42:58Z)
StyleBrush: Style Extraction and Transfer from a Single Image [19.652575295703485]
Stylization for visual content aims to add specific style patterns at the pixel level while preserving the original structural features. We propose StyleBrush, a method that accurately captures styles from a reference image and brushes'' the extracted style onto other input visual content.
arXiv Detail & Related papers (2024-08-18T14:27:20Z)
ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images. We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z)
Visual Style Prompting with Swapping Self-Attention [26.511518230332758]
We propose a novel approach to produce a diverse range of images while maintaining specific style elements and nuances. During the denoising process, we keep the query from original features while swapping the key and value with those from reference features in the late self-attention layers. Our method demonstrates superiority over existing approaches, best reflecting the style of the references and ensuring that resulting images match the text prompts most accurately.
arXiv Detail & Related papers (2024-02-20T12:51:17Z)
Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images. By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z)
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter [78.75422651890776]
StyleCrafter is a generic method that enhances pre-trained T2V models with a style control adapter. To promote content-style disentanglement, we remove style descriptions from the text prompt and extract style information solely from the reference image. StyleCrafter efficiently generates high-quality stylized videos that align with the content of the texts and resemble the style of the reference images.
arXiv Detail & Related papers (2023-12-01T03:53:21Z)
Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences [49.66987347397398]
Few-Shot Stylized Visual Captioning aims to generate captions in any desired style, using only a few examples as guidance during inference. We propose a framework called FS-StyleCap for this task, which utilizes a conditional encoder-decoder language model and a visual projection module.
arXiv Detail & Related papers (2023-07-31T04:26:01Z)
DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results. We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z)
StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Translation [10.357474047610172]
We present an approach for generating styled drawings for a given text description where a user can specify a desired drawing style. Inspired by a theory in art that style and content are generally inseparable during the creative process, we propose a coupled approach, known here as StyleCLIPDraw. Based on human evaluation, the styles of images generated by StyleCLIPDraw are strongly preferred to those by the sequential approach.
arXiv Detail & Related papers (2022-02-24T21:03:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.