HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced
Diffusion Models
- URL: http://arxiv.org/abs/2401.05870v1
- Date: Thu, 11 Jan 2024 12:26:23 GMT
- Title: HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced
Diffusion Models
- Authors: Hanzhang Wang, Haoran Wang, Jinze Yang, Zhongrui Yu, Zeke Xie, Lei
Tian, Xinyan Xiao, Junjun Jiang, Xianming Liu, Mingming Sun
- Abstract summary: The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video.
We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues.
A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
- Score: 84.12784265734238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of Arbitrary Style Transfer (AST) is injecting the artistic features
of a style reference into a given image/video. Existing methods usually focus
on pursuing the balance between style and content, whereas ignoring the
significant demand for flexible and customized stylization results and thereby
limiting their practical application. To address this critical issue, a novel
AST approach namely HiCAST is proposed, which is capable of explicitly
customizing the stylization results according to various source of semantic
clues. In the specific, our model is constructed based on Latent Diffusion
Model (LDM) and elaborately designed to absorb content and style instance as
conditions of LDM. It is characterized by introducing of \textit{Style
Adapter}, which allows user to flexibly manipulate the output results by
aligning multi-level style information and intrinsic knowledge in LDM. Lastly,
we further extend our model to perform video AST. A novel learning objective is
leveraged for video diffusion model training, which significantly improve
cross-frame temporal consistency in the premise of maintaining stylization
strength. Qualitative and quantitative comparisons as well as comprehensive
user studies demonstrate that our HiCAST outperforms the existing SoTA methods
in generating visually plausible stylization results.
Related papers
- ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models [42.45078883553856]
Stylized Text-to-Image Generation (STIG) aims to generate images based on text prompts and style reference images.
We in this paper propose a novel framework dubbed as StyleMaster for this task by leveraging pretrained Stable Diffusion.
Two objective functions are introduced to optimize the model together with denoising loss, which can further enhance semantic and style consistency.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer [19.355744690301403]
We introduce a novel artistic style transfer method based on a pre-trained large-scale diffusion model without any optimization.
Our experimental results demonstrate that our proposed method surpasses state-of-the-art methods in both conventional and diffusion-based style transfer baselines.
arXiv Detail & Related papers (2023-12-11T09:53:12Z) - Phasic Content Fusing Diffusion Model with Directional Distribution
Consistency for Few-Shot Model Adaption [73.98706049140098]
We propose a novel phasic content fusing few-shot diffusion model with directional distribution consistency loss.
Specifically, we design a phasic training strategy with phasic content fusion to help our model learn content and style information when t is large.
Finally, we propose a cross-domain structure guidance strategy that enhances structure consistency during domain adaptation.
arXiv Detail & Related papers (2023-09-07T14:14:11Z) - WSAM: Visual Explanations from Style Augmentation as Adversarial
Attacker and Their Influence in Image Classification [2.282270386262498]
This paper outlines a style augmentation algorithm using noise-based sampling with addition to improving randomization on a general linear transformation for style transfer.
All models not only present incredible robustness against image stylizing but also outperform all previous methods and surpass the state-of-the-art performance for the STL-10 dataset.
arXiv Detail & Related papers (2023-08-29T02:50:36Z) - ArtFusion: Controllable Arbitrary Style Transfer using Dual Conditional
Latent Diffusion Models [0.0]
Arbitrary Style Transfer (AST) aims to transform images by adopting the style from any selected artwork.
We propose a new approach, ArtFusion, which provides a flexible balance between content and style.
arXiv Detail & Related papers (2023-06-15T17:58:36Z) - MODIFY: Model-driven Face Stylization without Style Images [77.24793103549158]
Existing face stylization methods always acquire the presence of the target (style) domain during the translation process.
We propose a new method called MODel-drIven Face stYlization (MODIFY), which relies on the generative model to bypass the dependence of the target images.
Experimental results on several different datasets validate the effectiveness of MODIFY for unsupervised face stylization.
arXiv Detail & Related papers (2023-03-17T08:35:17Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - Adversarial Style Augmentation for Domain Generalized Urban-Scene
Segmentation [120.96012935286913]
We propose a novel adversarial style augmentation approach, which can generate hard stylized images during training.
Experiments on two synthetic-to-real semantic segmentation benchmarks demonstrate that AdvStyle can significantly improve the model performance on unseen real domains.
arXiv Detail & Related papers (2022-07-11T14:01:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.