Related papers: RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control

RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control

URL: http://arxiv.org/abs/2405.17401v1
Date: Mon, 27 May 2024 17:51:08 GMT
Title: RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control
Authors: Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu,
Abstract summary: We propose a new plug-and-play solution for training-free personalization of diffusion models. RB-Modulation is built on a novel optimal controller where a style descriptor encodes the desired attributes. Cross-attention-based feature aggregation scheme allows RB-Modulation to decouple content and style from the reference image.
Score: 43.96257216397601
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of style and content. RB-Modulation is built on a novel stochastic optimal controller where a style descriptor encodes the desired attributes through a terminal cost. The resulting drift not only overcomes the difficulties above, but also ensures high fidelity to the reference style and adheres to the given text prompt. We also introduce a cross-attention-based feature aggregation scheme that allows RB-Modulation to decouple content and style from the reference image. With theoretical justification and empirical evidence, our framework demonstrates precise extraction and control of content and style in a training-free manner. Further, our method allows a seamless composition of content and style, which marks a departure from the dependency on external adapters or ControlNets.

Related papers

Only-Style: Stylistic Consistency in Image Generation without Content Leakage [21.68241134664501]
Only-Style is a method designed to mitigate content leakage in a semantically coherent manner while preserving stylistic consistency.<n>Only-Style works by localizing content leakage during inference, allowing the adaptive tuning of a parameter that controls the style alignment process.<n>Our approach demonstrates a significant improvement over state-of-the-art methods through extensive evaluation across diverse instances.
arXiv Detail & Related papers (2025-06-11T16:33:09Z)
AttenST: A Training-Free Attention-Driven Style Transfer Framework with Pre-Trained Diffusion Models [4.364797586362505]
AttenST is a training-free attention-driven style transfer framework. We propose a style-guided self-attention mechanism that conditions self-attention on the reference style. We also introduce a dual-feature cross-attention mechanism to fuse content and style features.
arXiv Detail & Related papers (2025-03-10T13:28:36Z)
Z-STAR+: A Zero-shot Style Transfer Method via Adjusting Style Distribution [24.88532732093652]
Style transfer presents a significant challenge, primarily centered on identifying an appropriate style representation. In contrast to existing approaches, we have discovered that latent features in vanilla diffusion models inherently contain natural style and content distributions. Our method adopts dual denoising paths to represent content and style references in latent space, subsequently guiding the content image denoising process with style latent codes.
arXiv Detail & Related papers (2024-11-28T15:56:17Z)
ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps. We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z)
Artist: Aesthetically Controllable Text-Driven Stylization without Training [19.5597806965592]
We introduce textbfArtist, a training-free approach that aesthetically controls the content and style generation of a pretrained diffusion model for text-driven stylization. Our key insight is to disentangle the denoising of content and style into separate diffusion processes while sharing information between them. Our method excels at achieving aesthetic-level stylization requirements, preserving intricate details in the content image and aligning well with the style prompt.
arXiv Detail & Related papers (2024-07-22T17:58:05Z)
ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images. We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z)
HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced Diffusion Models [84.12784265734238]
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video. We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues. A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
arXiv Detail & Related papers (2024-01-11T12:26:23Z)
PARASOL: Parametric Style Control for Diffusion Image Synthesis [18.852986904591358]
PARASOL is a multi-modal synthesis model that enables disentangled, parametric control of the visual style of the image. We leverage auxiliary semantic and style-based search to create training triplets for supervision of the latent diffusion model.
arXiv Detail & Related papers (2023-03-11T17:30:36Z)
Towards Controllable and Photorealistic Region-wise Image Manipulation [11.601157452472714]
We present a generative model with auto-encoder architecture for per-region style manipulation. We apply a code consistency loss to enforce an explicit disentanglement between content and style latent representations. The model is constrained by a content alignment loss to ensure the foreground editing will not interfere background contents.
arXiv Detail & Related papers (2021-08-19T13:29:45Z)
StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval [119.03470556503942]
Crossmodal matching problem is typically solved by learning a joint embedding space where semantic content shared between photo and sketch modalities are preserved. An effective model needs to explicitly account for this style diversity, crucially, to unseen user styles. Our model can not only disentangle the cross-modal shared semantic content, but can adapt the disentanglement to any unseen user style as well, making the model truly agnostic.
arXiv Detail & Related papers (2021-03-29T15:44:19Z)
Arbitrary Style Transfer via Multi-Adaptation Network [109.6765099732799]
A desired style transfer, given a content image and referenced style painting, would render the content image with the color tone and vivid stroke patterns of the style painting. A new disentanglement loss function enables our network to extract main style patterns and exact content structures to adapt to various input images.
arXiv Detail & Related papers (2020-05-27T08:00:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.