Related papers: Style Composition within Distinct LoRA modules for Traditional Art

Style Composition within Distinct LoRA modules for Traditional Art

URL: http://arxiv.org/abs/2507.11986v1
Date: Wed, 16 Jul 2025 07:36:07 GMT
Title: Style Composition within Distinct LoRA modules for Traditional Art
Authors: Jaehyun Lee, Wonhark Park, Wonsik Shin, Hyunho Lee, Hyoung Min Na, Nojun Kwak,
Abstract summary: We propose a zero-shot diffusion pipeline that naturally blends multiple styles.<n>We leverage the fact that lower-noise latents carry stronger stylistic information.<n>We incorporate depth-map conditioning via ControlNet into the diffusion framework.
Score: 21.954368353156546
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion-based text-to-image models have achieved remarkable results in synthesizing diverse images from text prompts and can capture specific artistic styles via style personalization. However, their entangled latent space and lack of smooth interpolation make it difficult to apply distinct painting techniques in a controlled, regional manner, often causing one style to dominate. To overcome this, we propose a zero-shot diffusion pipeline that naturally blends multiple styles by performing style composition on the denoised latents predicted during the flow-matching denoising process of separately trained, style-specialized models. We leverage the fact that lower-noise latents carry stronger stylistic information and fuse them across heterogeneous diffusion pipelines using spatial masks, enabling precise, region-specific style control. This mechanism preserves the fidelity of each individual style while allowing user-guided mixing. Furthermore, to ensure structural coherence across different models, we incorporate depth-map conditioning via ControlNet into the diffusion framework. Qualitative and quantitative experiments demonstrate that our method successfully achieves region-specific style mixing according to the given masks.

Related papers

HarmonPaint: Harmonized Training-Free Diffusion Inpainting [58.870763247178495]
HarmonPaint is a training-free inpainting framework that seamlessly integrates with the attention mechanisms of diffusion models.<n>By leveraging masking strategies within self-attention, HarmonPaint ensures structural fidelity without model retraining or fine-tuning.
arXiv Detail & Related papers (2025-07-22T16:14:35Z)
Be Decisive: Noise-Induced Layouts for Multi-Subject Generation [56.80513553424086]
Complex prompts lead to subject leakage, causing inaccuracies in quantities, attributes, and visual features.<n>We introduce a new approach that predicts a spatial layout aligned with the prompt, derived from the initial noise, and refines it throughout the denoising process.<n>Our method employs a small neural network to predict and refine the evolving noise-induced layout at each denoising step.
arXiv Detail & Related papers (2025-05-27T17:54:24Z)
Unsupervised Region-Based Image Editing of Denoising Diffusion Models [50.005612464340246]
We propose a method to identify semantic attributes in the latent space of pre-trained diffusion models without any further training.<n>Our approach facilitates precise semantic discovery and control over local masked areas, eliminating the need for annotations.
arXiv Detail & Related papers (2024-12-17T13:46:12Z)
Z-STAR+: A Zero-shot Style Transfer Method via Adjusting Style Distribution [24.88532732093652]
Style transfer presents a significant challenge, primarily centered on identifying an appropriate style representation.<n>In contrast to existing approaches, we have discovered that latent features in vanilla diffusion models inherently contain natural style and content distributions.<n>Our method adopts dual denoising paths to represent content and style references in latent space, subsequently guiding the content image denoising process with style latent codes.
arXiv Detail & Related papers (2024-11-28T15:56:17Z)
DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer [13.588643982359413]
Style transfer aims to fuse the artistic representation of a style image with the structural information of a content image. Existing methods train specific networks or utilize pre-trained models to learn content and style features. We propose a novel and training-free approach for style transfer, combining textual embedding with spatial features.
arXiv Detail & Related papers (2024-10-19T06:42:43Z)
ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps. We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z)
DiffStyler: Diffusion-based Localized Image Style Transfer [0.0]
Image style transfer aims to imbue digital imagery with the distinctive attributes of style targets, such as colors, brushstrokes, shapes. Despite the advancements in arbitrary style transfer methods, a prevalent challenge remains the delicate equilibrium between content semantics and style attributes. This paper introduces DiffStyler, a novel approach that facilitates efficient and precise arbitrary image style transfer.
arXiv Detail & Related papers (2024-03-27T11:19:34Z)
One-Shot Structure-Aware Stylized Image Synthesis [7.418475280387784]
OSASIS is a novel one-shot stylization method that is robust in structure preservation. We show that OSASIS is able to effectively disentangle the semantics from the structure of an image, allowing it to control the level of content and style implemented to a given input. Results show that OSASIS outperforms other stylization methods, especially for input images that were rarely encountered during training.
arXiv Detail & Related papers (2024-02-27T07:42:55Z)
HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced Diffusion Models [84.12784265734238]
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video. We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues. A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
arXiv Detail & Related papers (2024-01-11T12:26:23Z)
MODIFY: Model-driven Face Stylization without Style Images [77.24793103549158]
Existing face stylization methods always acquire the presence of the target (style) domain during the translation process. We propose a new method called MODel-drIven Face stYlization (MODIFY), which relies on the generative model to bypass the dependence of the target images. Experimental results on several different datasets validate the effectiveness of MODIFY for unsupervised face stylization.
arXiv Detail & Related papers (2023-03-17T08:35:17Z)
Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods [27.014858633903867]
We present a training framework for feature disentanglement of Diffusion Models (FDiff)<n>We propose two sampling methods that can boost the realism of our Diffusion Models and also enhance the controllability.
arXiv Detail & Related papers (2023-02-28T07:43:00Z)
DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results. We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z)
Anisotropic Stroke Control for Multiple Artists Style Transfer [36.92721585146738]
Stroke Control Multi-Artist Style Transfer framework is developed. Anisotropic Stroke Module (ASM) endows the network with the ability of adaptive semantic-consistency among various styles. In contrast to the single-scale conditional discriminator, our discriminator is able to capture multi-scale texture clue.
arXiv Detail & Related papers (2020-10-16T05:32:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.