DiffArtist: Towards Aesthetic-Aligned Diffusion Model Control for Training-free Text-Driven Stylization
- URL: http://arxiv.org/abs/2407.15842v2
- Date: Sun, 22 Dec 2024 10:03:12 GMT
- Title: DiffArtist: Towards Aesthetic-Aligned Diffusion Model Control for Training-free Text-Driven Stylization
- Authors: Ruixiang Jiang, Changwen Chen,
- Abstract summary: Diffusion models entangle content and style generation during the denoising process.
DiffusionArtist is the first approach that enables aesthetic-aligned control of content and style during the entire diffusion process.
- Score: 19.5597806965592
- License:
- Abstract: Diffusion models entangle content and style generation during the denoising process, leading to undesired content modification or insufficient style strength when directly applied to stylization tasks. Existing methods struggle to effectively control the diffusion model to meet the aesthetic-level requirements for stylization. In this paper, we introduce DiffArtist, the first approach that enables aesthetic-aligned control of content and style during the entire diffusion process, without additional training. Our key insight is to design disentangled representations for content and style in the noise space. By sharing features between content and style representations, we enable fine-grained control of structural and appearance-level style strength without compromising visual-appeal. We further propose Vision-Language Model (VLM)-based evaluation metrics for stylization, which align better with human preferences. Extensive experiments demonstrate that DiffArtist outperforms existing methods in alignment with human preferences and offers enhanced controllability. Project homepage: https://DiffusionArtist.github.io
Related papers
- Content-style disentangled representation for controllable artistic image stylization and generation [0.0]
Controllable artistic image stylization and generation aims to render the content provided by text or image with the learned artistic style.
This paper proposes a content-style representation disentangling method for controllable artistic image stylization and generation.
arXiv Detail & Related papers (2024-12-19T03:42:58Z) - Z-STAR+: A Zero-shot Style Transfer Method via Adjusting Style Distribution [24.88532732093652]
Style transfer presents a significant challenge, primarily centered on identifying an appropriate style representation.
In contrast to existing approaches, we have discovered that latent features in vanilla diffusion models inherently contain natural style and content distributions.
Our method adopts dual denoising paths to represent content and style references in latent space, subsequently guiding the content image denoising process with style latent codes.
arXiv Detail & Related papers (2024-11-28T15:56:17Z) - DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer [13.588643982359413]
Style transfer aims to fuse the artistic representation of a style image with the structural information of a content image.
Existing methods train specific networks or utilize pre-trained models to learn content and style features.
We propose a novel and training-free approach for style transfer, combining textual embedding with spatial features.
arXiv Detail & Related papers (2024-10-19T06:42:43Z) - SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model [64.28263381647628]
Talking Head Generation (THG) is an important task with broad application prospects in various fields such as digital humans, film production, and virtual reality.
We propose a novel framework named Style-Enhanced Vivid Portrait (SVP) which fully leverages style-related information in THG.
Our model generates diverse, vivid, and high-quality videos with flexible control over intrinsic styles, outperforming existing state-of-the-art methods.
arXiv Detail & Related papers (2024-09-05T06:27:32Z) - D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods [2.468658581089448]
We propose a novel framework called D$2$Styler (Discrete Diffusion Styler)
Our method uses Adaptive Instance Normalization (AdaIN) features as a context guide for the reverse diffusion process.
Experimental results demonstrate that D$2$Styler produces high-quality style-transferred images.
arXiv Detail & Related papers (2024-08-07T05:47:06Z) - ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced
Diffusion Models [84.12784265734238]
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video.
We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues.
A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
arXiv Detail & Related papers (2024-01-11T12:26:23Z) - ALADIN-NST: Self-supervised disentangled representation learning of
artistic style through Neural Style Transfer [60.6863849241972]
We learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image.
We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics.
arXiv Detail & Related papers (2023-04-12T10:33:18Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.