Related papers: ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model

ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model

URL: http://arxiv.org/abs/2405.15287v2
Date: Mon, 18 Nov 2024 09:35:46 GMT
Title: ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model
Authors: Chengming Xu, Kai Hu, Qilin Wang, Donghao Luo, Jiangning Zhang, Xiaobin Hu, Yanwei Fu, Chengjie Wang,
Abstract summary: Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images. We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
Score: 73.95608242322949
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images. In this paper, we present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion (SD) to address challenges such as misinterpreted styles and inconsistent semantics. Our approach introduces two innovative modules: the mixed style descriptor and the dynamic attention adapter. The mixed style descriptor enhances SD by combining content-aware and frequency-disentangled embeddings from CLIP with additional sources that capture global statistics and textual information, thus providing a richer blend of style-related and semantic-related knowledge. To achieve a better balance between adapter capacity and semantic control, the dynamic attention adapter is integrated into the diffusion UNet, dynamically calculating adaptation weights based on the style descriptors. Additionally, we introduce two objective functions to optimize the model alongside the denoising loss, further enhancing semantic and style consistency. Extensive experiments demonstrate the superiority of ArtWeaver over existing methods, producing images with diverse target styles while maintaining the semantic integrity of the text prompts.

Related papers

Masked Language Prompting for Generative Data Augmentation in Few-shot Fashion Style Recognition [1.03590082373586]
A dataset for fashion style recognition is challenging due to the inherent subjectivity and ambiguity of style concepts. Recent advances in text-to-image models have facilitated generative data augmentation by synthesizing images from labeled data. We propose textbfMasked Language Prompting (MLP), a novel prompting strategy that masks selected words in a reference caption and leverages large language models to generate diverse yet semantically coherents.
arXiv Detail & Related papers (2025-04-28T03:42:42Z)
ArtCrafter: Text-Image Aligning Style Transfer via Embedding Reframing [25.610375901522886]
ArtCrafter is a novel framework for text-to-image style transfer. We introduce an attention-based style extraction module. We also present a novel text-image aligning augmentation component.
arXiv Detail & Related papers (2025-01-03T19:17:27Z)
DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer [13.588643982359413]
Style transfer aims to fuse the artistic representation of a style image with the structural information of a content image. Existing methods train specific networks or utilize pre-trained models to learn content and style features. We propose a novel and training-free approach for style transfer, combining textual embedding with spatial features.
arXiv Detail & Related papers (2024-10-19T06:42:43Z)
ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps. We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z)
StyleInject: Parameter Efficient Tuning of Text-to-Image Diffusion Models [35.732715025002705]
StyleInject is a specialized fine-tuning approach tailored for text-to-image models. It adapts to varying styles by adjusting the variance of visual features based on the characteristics of the input signal. It proves particularly effective in learning from and enhancing a range of advanced, community-fine-tuned generative models.
arXiv Detail & Related papers (2024-01-25T04:53:03Z)
HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced Diffusion Models [84.12784265734238]
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video. We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues. A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
arXiv Detail & Related papers (2024-01-11T12:26:23Z)
ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation. We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network. Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z)
StyleAdapter: A Unified Stylized Image Generation Model [97.24936247688824]
StyleAdapter is a unified stylized image generation model capable of producing a variety of stylized images. It can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet.
arXiv Detail & Related papers (2023-09-04T19:16:46Z)
DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results. We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z)
StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval [119.03470556503942]
Crossmodal matching problem is typically solved by learning a joint embedding space where semantic content shared between photo and sketch modalities are preserved. An effective model needs to explicitly account for this style diversity, crucially, to unseen user styles. Our model can not only disentangle the cross-modal shared semantic content, but can adapt the disentanglement to any unseen user style as well, making the model truly agnostic.
arXiv Detail & Related papers (2021-03-29T15:44:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.