DiffArtist: Towards Structure and Appearance Controllable Image Stylization
- URL: http://arxiv.org/abs/2407.15842v4
- Date: Wed, 27 Aug 2025 10:30:27 GMT
- Title: DiffArtist: Towards Structure and Appearance Controllable Image Stylization
- Authors: Ruixiang Jiang, Changwen Chen,
- Abstract summary: textbfDiffArtist is the first 2D stylization method to offer fine-grained, simultaneous control over both structure and appearance style strength.<n>Our analysis shows that DiffArtist achieves superior style fidelity and dual-controllability compared to state-of-the-art methods.
- Score: 35.59051707152096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artistic styles are defined by both their structural and appearance elements. Existing neural stylization techniques primarily focus on transferring appearance-level features such as color and texture, often neglecting the equally crucial aspect of structural stylization. To address this gap, we introduce \textbf{DiffArtist}, the first 2D stylization method to offer fine-grained, simultaneous control over both structure and appearance style strength. This dual controllability is achieved by representing structure and appearance generation as separate diffusion processes, necessitating no further tuning or additional adapters. To properly evaluate this new capability of dual stylization, we further propose a Multimodal LLM-based stylization evaluator that aligns significantly better with human preferences than existing metrics. Extensive analysis shows that DiffArtist achieves superior style fidelity and dual-controllability compared to state-of-the-art methods. Its text-driven, training-free design and unprecedented dual controllability make it a powerful and interactive tool for various creative applications. Project homepage: https://diffusionartist.github.io.
Related papers
- One-shot Embroidery Customization via Contrastive LoRA Modulation [20.463441212598273]
We propose a novel contrastive learning framework that disentangles fine-grained style and content features with a single reference image.<n>To evaluate our method on fine-grained style transfer, we build a benchmark for embroidery customization.
arXiv Detail & Related papers (2025-09-23T12:58:15Z) - Neural Scene Designer: Self-Styled Semantic Image Manipulation [67.43125248646653]
We introduce the Neural Scene Designer (NSD), a novel framework that enables photo-realistic manipulation of user-specified scene regions.<n>NSD ensures both semantic alignment with user intent and stylistic consistency with the surrounding environment.<n>To capture fine-grained style representations, we propose the Progressive Self-style Representational Learning (PSRL) module.
arXiv Detail & Related papers (2025-09-01T11:59:03Z) - Training Free Stylized Abstraction [27.307331773270676]
Stylized abstraction synthesizes visually exaggerated yet semantically faithful representations of subjects, balancing recognizability with perceptual distortion.<n>We propose a training-free framework that generates stylized abstractions from a single image using inference-time scaling in vision-language models (VLLMs)<n>Our method adapts structural restoration dynamically through style-aware temporal scheduling, enabling high-fidelity reconstructions that honor both subject and style.
arXiv Detail & Related papers (2025-05-28T17:59:57Z) - Compose Your Aesthetics: Empowering Text-to-Image Models with the Principles of Art [61.28133495240179]
We propose a novel task of aesthetics alignment which seeks to align user-specified aesthetics with the T2I generation output.
Inspired by how artworks provide an invaluable perspective to approach aesthetics, we codify visual aesthetics using the compositional framework artists employ.
We demonstrate that T2I DMs can effectively offer 10 compositional controls through user-specified PoA conditions.
arXiv Detail & Related papers (2025-03-15T06:58:09Z) - StyleBlend: Enhancing Style-Specific Content Creation in Text-to-Image Diffusion Models [10.685779311280266]
StyleBlend is a method designed to learn and apply style representations from a limited set of reference images.
Our approach decomposes style into two components, composition and texture, each learned through different strategies.
arXiv Detail & Related papers (2025-02-13T08:26:54Z) - IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features [89.95303251220734]
We present a training-free framework to solve the style attribution problem, using the features produced by a diffusion model alone.
This is denoted as introspective style attribution (IntroStyle) and demonstrates superior performance to state-of-the-art models for style retrieval.
We also introduce a synthetic dataset of Style Hacks (SHacks) to isolate artistic style and evaluate fine-grained style attribution performance.
arXiv Detail & Related papers (2024-12-19T01:21:23Z) - Learning Artistic Signatures: Symmetry Discovery and Style Transfer [8.288443063900825]
There is no undisputed definition of artistic style.
Style should be thought of as a set of global symmetries that dictate the arrangement of local textures.
We show that by considering both local and global features, using both Lie generators and traditional measures of texture, we can quantitatively capture the stylistic similarity between artists better than with either set of features alone.
arXiv Detail & Related papers (2024-12-05T18:56:23Z) - DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer [13.588643982359413]
Style transfer aims to fuse the artistic representation of a style image with the structural information of a content image.
Existing methods train specific networks or utilize pre-trained models to learn content and style features.
We propose a novel and training-free approach for style transfer, combining textual embedding with spatial features.
arXiv Detail & Related papers (2024-10-19T06:42:43Z) - VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models [53.59400446543756]
Artistic typography is a technique to visualize the meaning of input character in an imaginable and readable manner.<n>We introduce a dual-branch, training-free method called VitaGlyph, enabling flexible artistic typography with controllable geometry changes.
arXiv Detail & Related papers (2024-10-02T16:48:47Z) - StyleBrush: Style Extraction and Transfer from a Single Image [19.652575295703485]
Stylization for visual content aims to add specific style patterns at the pixel level while preserving the original structural features.
We propose StyleBrush, a method that accurately captures styles from a reference image and brushes'' the extracted style onto other input visual content.
arXiv Detail & Related papers (2024-08-18T14:27:20Z) - InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation [4.1177497612346]
Style transfer is an inventive process designed to create an image that maintains the essence of the original while embracing the visual style of another.
We introduce InstantStyle-Plus, an approach that prioritizes the integrity of the original content while seamlessly integrating the target style.
arXiv Detail & Related papers (2024-06-30T18:05:33Z) - ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt [12.27693060663517]
Artistic style transfer aims to transfer the learned artistic style onto an arbitrary content image, generating artistic stylized images.
We propose a novel pre-trained diffusion-based artistic style transfer method, called LSAST.
Our proposed method can generate more highly realistic artistic stylized images than the state-of-the-art artistic style transfer methods.
arXiv Detail & Related papers (2024-04-17T15:28:53Z) - StyleForge: Enhancing Text-to-Image Synthesis for Any Artistic Styles with Dual Binding [7.291687946822539]
We introduce Single-StyleForge, a novel approach for personalized text-to-image synthesis across diverse artistic styles.
We also present Multi-StyleForge, which enhances image quality and text alignment by binding multiple tokens to partial style attributes.
arXiv Detail & Related papers (2024-04-08T07:43:23Z) - Implicit Style-Content Separation using B-LoRA [61.664293840163865]
We introduce B-LoRA, a method that implicitly separate the style and content components of a single image.
By analyzing the architecture of SDXL combined with LoRA, we find that jointly learning the LoRA weights of two specific blocks achieves style-content separation.
arXiv Detail & Related papers (2024-03-21T17:20:21Z) - Deformable One-shot Face Stylization via DINO Semantic Guidance [12.771707124161665]
This paper addresses the issue of one-shot face stylization, focusing on the simultaneous consideration of appearance and structure.
We explore deformation-aware face stylization that diverges from traditional single-image style reference, opting for a real-style image pair instead.
arXiv Detail & Related papers (2024-03-01T11:30:55Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter [78.75422651890776]
StyleCrafter is a generic method that enhances pre-trained T2V models with a style control adapter.
To promote content-style disentanglement, we remove style descriptions from the text prompt and extract style information solely from the reference image.
StyleCrafter efficiently generates high-quality stylized videos that align with the content of the texts and resemble the style of the reference images.
arXiv Detail & Related papers (2023-12-01T03:53:21Z) - $Z^*$: Zero-shot Style Transfer via Attention Rearrangement [27.185432348397693]
This study shows that vanilla diffusion models can directly extract style information and seamlessly integrate the generative prior into the content image without retraining.
We adopt dual denoising paths to represent content/style references in latent space and then guide the content image denoising process with style latent codes.
arXiv Detail & Related papers (2023-11-25T11:03:43Z) - ControlStyle: Text-Driven Stylized Image Generation Using Diffusion
Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation.
We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network.
Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z) - ArtFusion: Controllable Arbitrary Style Transfer using Dual Conditional
Latent Diffusion Models [0.0]
Arbitrary Style Transfer (AST) aims to transform images by adopting the style from any selected artwork.
We propose a new approach, ArtFusion, which provides a flexible balance between content and style.
arXiv Detail & Related papers (2023-06-15T17:58:36Z) - ALADIN-NST: Self-supervised disentangled representation learning of
artistic style through Neural Style Transfer [60.6863849241972]
We learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image.
We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics.
arXiv Detail & Related papers (2023-04-12T10:33:18Z) - Training-Free Structured Diffusion Guidance for Compositional
Text-to-Image Synthesis [78.28620571530706]
Large-scale diffusion models have achieved state-of-the-art results on text-to-image synthesis (T2I) tasks.
We improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.
arXiv Detail & Related papers (2022-12-09T18:30:24Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z) - Arbitrary Style Transfer with Structure Enhancement by Combining the
Global and Local Loss [51.309905690367835]
We introduce a novel arbitrary style transfer method with structure enhancement by combining the global and local loss.
Experimental results demonstrate that our method can generate higher-quality images with impressive visual effects.
arXiv Detail & Related papers (2022-07-23T07:02:57Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer [103.54337984566877]
Recent studies on StyleGAN show high performance on artistic portrait generation by transfer learning with limited data.
We introduce a novel DualStyleGAN with flexible control of dual styles of the original face domain and the extended artistic portrait domain.
Experiments demonstrate the superiority of DualStyleGAN over state-of-the-art methods in high-quality portrait style transfer and flexible style control.
arXiv Detail & Related papers (2022-03-24T17:57:11Z) - Anisotropic Stroke Control for Multiple Artists Style Transfer [36.92721585146738]
Stroke Control Multi-Artist Style Transfer framework is developed.
Anisotropic Stroke Module (ASM) endows the network with the ability of adaptive semantic-consistency among various styles.
In contrast to the single-scale conditional discriminator, our discriminator is able to capture multi-scale texture clue.
arXiv Detail & Related papers (2020-10-16T05:32:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.