ControlStyle: Text-Driven Stylized Image Generation Using Diffusion
Priors
- URL: http://arxiv.org/abs/2311.05463v1
- Date: Thu, 9 Nov 2023 15:50:52 GMT
- Title: ControlStyle: Text-Driven Stylized Image Generation Using Diffusion
Priors
- Authors: Jingwen Chen and Yingwei Pan and Ting Yao and Tao Mei
- Abstract summary: We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation.
We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network.
Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
- Score: 105.37795139586075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the multimedia community has witnessed the rise of diffusion models
trained on large-scale multi-modal data for visual content creation,
particularly in the field of text-to-image generation. In this paper, we
propose a new task for ``stylizing'' text-to-image models, namely text-driven
stylized image generation, that further enhances editability in content
creation. Given input text prompt and style image, this task aims to produce
stylized images which are both semantically relevant to input text prompt and
meanwhile aligned with the style image in style. To achieve this, we present a
new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image
model with a trainable modulation network enabling more conditions of text
prompts and style images. Moreover, diffusion style and content regularizations
are simultaneously introduced to facilitate the learning of this modulation
network with these diffusion priors, pursuing high-quality stylized
text-to-image generation. Extensive experiments demonstrate the effectiveness
of our ControlStyle in producing more visually pleasing and artistic results,
surpassing a simple combination of text-to-image model and conventional style
transfer techniques.
Related papers
- ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models [52.23899502520261]
We introduce a new framework named ARTIST to focus on the learning of text structures.
We finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model.
Empirical results on the MARIO-Eval benchmark underscore the effectiveness of the proposed method, showing an improvement of up to 15% in various metrics.
arXiv Detail & Related papers (2024-06-17T19:31:24Z) - FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation [38.730628018627975]
This research aims to tackle the generation of text effects for multilingual fonts.
We introduce a novel shape-adaptive diffusion model capable of interpreting the given shape.
We also present a training-free, shape-adaptive effect transfer method for transferring textures from a generated reference letter to others.
arXiv Detail & Related papers (2024-06-12T16:43:47Z) - StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models [42.45078883553856]
Stylized Text-to-Image Generation (STIG) aims to generate images based on text prompts and style reference images.
We in this paper propose a novel framework dubbed as StyleMaster for this task by leveraging pretrained Stable Diffusion.
Two objective functions are introduced to optimize the model together with denoising loss, which can further enhance semantic and style consistency.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - CustomText: Customized Textual Image Generation using Diffusion Models [13.239661107392324]
Textual image generation spans diverse fields like advertising, education, product packaging, social media, information visualization, and branding.
Despite recent strides in language-guided image synthesis using diffusion models, current models excel in image generation but struggle with accurate text rendering and offer limited control over font attributes.
In this paper, we aim to enhance the synthesis of high-quality images with precise text customization, thereby contributing to the advancement of image generation models.
arXiv Detail & Related papers (2024-05-21T06:43:03Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z) - GlyphDiffusion: Text Generation as Image Generation [100.98428068214736]
We propose GlyphDiffusion, a novel diffusion approach for text generation via text-guided image generation.
Our key idea is to render the target text as a glyph image containing visual language content.
Our model also makes significant improvements compared to the recent diffusion model.
arXiv Detail & Related papers (2023-04-25T02:14:44Z) - Plug-and-Play Diffusion Features for Text-Driven Image-to-Image
Translation [10.39028769374367]
We present a new framework that takes text-to-image synthesis to the realm of image-to-image translation.
Our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text.
arXiv Detail & Related papers (2022-11-22T20:39:18Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.