PARASOL: Parametric Style Control for Diffusion Image Synthesis
- URL: http://arxiv.org/abs/2303.06464v3
- Date: Thu, 2 May 2024 02:21:18 GMT
- Title: PARASOL: Parametric Style Control for Diffusion Image Synthesis
- Authors: Gemma Canet Tarrés, Dan Ruta, Tu Bui, John Collomosse,
- Abstract summary: PARASOL is a multi-modal synthesis model that enables disentangled, parametric control of the visual style of the image.
We leverage auxiliary semantic and style-based search to create training triplets for supervision of the latent diffusion model.
- Score: 18.852986904591358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose PARASOL, a multi-modal synthesis model that enables disentangled, parametric control of the visual style of the image by jointly conditioning synthesis on both content and a fine-grained visual style embedding. We train a latent diffusion model (LDM) using specific losses for each modality and adapt the classifier-free guidance for encouraging disentangled control over independent content and style modalities at inference time. We leverage auxiliary semantic and style-based search to create training triplets for supervision of the LDM, ensuring complementarity of content and style cues. PARASOL shows promise for enabling nuanced control over visual style in diffusion models for image creation and stylization, as well as generative search where text-based search results may be adapted to more closely match user intent by interpolating both content and style descriptors.
Related papers
- ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control [43.96257216397601]
We propose a new plug-and-play solution for training-free personalization of diffusion models.
RB-Modulation is built on a novel optimal controller where a style descriptor encodes the desired attributes.
Cross-attention-based feature aggregation scheme allows RB-Modulation to decouple content and style from the reference image.
arXiv Detail & Related papers (2024-05-27T17:51:08Z) - ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced
Diffusion Models [84.12784265734238]
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video.
We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues.
A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
arXiv Detail & Related papers (2024-01-11T12:26:23Z) - Learning Graph Neural Networks for Image Style Transfer [131.73237185888215]
State-of-the-art parametric and non-parametric style transfer approaches are prone to either distorted local style patterns due to global statistics alignment, or unpleasing artifacts resulting from patch mismatching.
In this paper, we study a novel semi-parametric neural style transfer framework that alleviates the deficiency of both parametric and non-parametric stylization.
arXiv Detail & Related papers (2022-07-24T07:41:31Z) - More Control for Free! Image Synthesis with Semantic Diffusion Guidance [79.88929906247695]
Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from an example image.
We introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both.
We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis.
arXiv Detail & Related papers (2021-12-10T18:55:50Z) - Diagonal Attention and Style-based GAN for Content-Style Disentanglement
in Image Generation and Translation [34.24876359759408]
We present a novel hierarchical adaptive Diagonal spatial ATtention layers to manipulate the spatial contents from styles in a hierarchical manner.
Our method enables coarse-to-fine level disentanglement of spatial contents and styles.
Our generator can be easily integrated into the GAN inversion framework so that the content and style of translated images can be flexibly controlled.
arXiv Detail & Related papers (2021-03-30T08:00:13Z) - StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval [119.03470556503942]
Crossmodal matching problem is typically solved by learning a joint embedding space where semantic content shared between photo and sketch modalities are preserved.
An effective model needs to explicitly account for this style diversity, crucially, to unseen user styles.
Our model can not only disentangle the cross-modal shared semantic content, but can adapt the disentanglement to any unseen user style as well, making the model truly agnostic.
arXiv Detail & Related papers (2021-03-29T15:44:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.