Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image
Generation
- URL: http://arxiv.org/abs/2211.07751v1
- Date: Mon, 14 Nov 2022 20:52:57 GMT
- Title: Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image
Generation
- Authors: Zhihong Pan, Xin Zhou, Hao Tian
- Abstract summary: Diffusion-based text-to-image generation models like GLIDE and DALLE-2 have gained wide success recently.
We propose a novel style guidance method to support generating images using arbitrary style guided by a reference image.
- Score: 13.894251782142584
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Diffusion-based text-to-image generation models like GLIDE and DALLE-2 have
gained wide success recently for their superior performance in turning complex
text inputs into images of high quality and wide diversity. In particular, they
are proven to be very powerful in creating graphic arts of various formats and
styles. Although current models supported specifying style formats like oil
painting or pencil drawing, fine-grained style features like color
distributions and brush strokes are hard to specify as they are randomly picked
from a conditional distribution based on the given text input. Here we propose
a novel style guidance method to support generating images using arbitrary
style guided by a reference image. The generation method does not require a
separate style transfer model to generate desired styles while maintaining
image quality in generated content as controlled by the text input.
Additionally, the guidance method can be applied without a style reference,
denoted as self style guidance, to generate images of more diverse styles.
Comprehensive experiments prove that the proposed method remains robust and
effective in a wide range of conditions, including diverse graphic art forms,
image content types and diffusion models.
Related papers
- FontDiffuser: One-Shot Font Generation via Denoising Diffusion with
Multi-Scale Content Aggregation and Style Contrastive Learning [45.696909070215476]
FontDiffuser is a diffusion-based image-to-image one-shot font generation method.
It consistently excels on complex characters and large style changes compared to previous methods.
arXiv Detail & Related papers (2023-12-19T13:23:20Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter [78.75422651890776]
StyleCrafter is a generic method that enhances pre-trained T2V models with a style control adapter.
To promote content-style disentanglement, we remove style descriptions from the text prompt and extract style information solely from the reference image.
StyleCrafter efficiently generates high-quality stylized videos that align with the content of the texts and resemble the style of the reference images.
arXiv Detail & Related papers (2023-12-01T03:53:21Z) - ControlStyle: Text-Driven Stylized Image Generation Using Diffusion
Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation.
We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network.
Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z) - ProSpect: Prompt Spectrum for Attribute-Aware Personalization of
Diffusion Models [77.03361270726944]
Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models.
We propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information.
We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout.
arXiv Detail & Related papers (2023-05-25T16:32:01Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.