HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced
Diffusion Models
- URL: http://arxiv.org/abs/2401.05870v1
- Date: Thu, 11 Jan 2024 12:26:23 GMT
- Title: HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced
Diffusion Models
- Authors: Hanzhang Wang, Haoran Wang, Jinze Yang, Zhongrui Yu, Zeke Xie, Lei
Tian, Xinyan Xiao, Junjun Jiang, Xianming Liu, Mingming Sun
- Abstract summary: The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video.
We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues.
A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
- Score: 84.12784265734238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of Arbitrary Style Transfer (AST) is injecting the artistic features
of a style reference into a given image/video. Existing methods usually focus
on pursuing the balance between style and content, whereas ignoring the
significant demand for flexible and customized stylization results and thereby
limiting their practical application. To address this critical issue, a novel
AST approach namely HiCAST is proposed, which is capable of explicitly
customizing the stylization results according to various source of semantic
clues. In the specific, our model is constructed based on Latent Diffusion
Model (LDM) and elaborately designed to absorb content and style instance as
conditions of LDM. It is characterized by introducing of \textit{Style
Adapter}, which allows user to flexibly manipulate the output results by
aligning multi-level style information and intrinsic knowledge in LDM. Lastly,
we further extend our model to perform video AST. A novel learning objective is
leveraged for video diffusion model training, which significantly improve
cross-frame temporal consistency in the premise of maintaining stylization
strength. Qualitative and quantitative comparisons as well as comprehensive
user studies demonstrate that our HiCAST outperforms the existing SoTA methods
in generating visually plausible stylization results.
Related papers
- StyleRWKV: High-Quality and High-Efficiency Style Transfer with RWKV-like Architecture [29.178246094092202]
Style transfer aims to generate a new image preserving the content but with the artistic representation of the style source.
Most of the existing methods are based on Transformers or diffusion models, however, they suffer from quadratic computational complexity and high inference time.
We present a novel framework StyleRWKV, to achieve high-quality style transfer with limited memory usage and linear time complexity.
arXiv Detail & Related papers (2024-12-27T09:01:15Z) - UniVST: A Unified Framework for Training-free Localized Video Style Transfer [102.52552893495475]
This paper presents UniVST, a unified framework for localized video style transfer based on diffusion model.
It operates without the need for training, offering a distinct advantage over existing diffusion methods that transfer style across entire videos.
arXiv Detail & Related papers (2024-10-26T05:28:02Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - DiffArtist: Towards Aesthetic-Aligned Diffusion Model Control for Training-free Text-Driven Stylization [19.5597806965592]
Diffusion models entangle content and style generation during the denoising process.
DiffusionArtist is the first approach that enables aesthetic-aligned control of content and style during the entire diffusion process.
arXiv Detail & Related papers (2024-07-22T17:58:05Z) - ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - WSAM: Visual Explanations from Style Augmentation as Adversarial
Attacker and Their Influence in Image Classification [2.282270386262498]
This paper outlines a style augmentation algorithm using noise-based sampling with addition to improving randomization on a general linear transformation for style transfer.
All models not only present incredible robustness against image stylizing but also outperform all previous methods and surpass the state-of-the-art performance for the STL-10 dataset.
arXiv Detail & Related papers (2023-08-29T02:50:36Z) - ArtFusion: Controllable Arbitrary Style Transfer using Dual Conditional
Latent Diffusion Models [0.0]
Arbitrary Style Transfer (AST) aims to transform images by adopting the style from any selected artwork.
We propose a new approach, ArtFusion, which provides a flexible balance between content and style.
arXiv Detail & Related papers (2023-06-15T17:58:36Z) - MODIFY: Model-driven Face Stylization without Style Images [77.24793103549158]
Existing face stylization methods always acquire the presence of the target (style) domain during the translation process.
We propose a new method called MODel-drIven Face stYlization (MODIFY), which relies on the generative model to bypass the dependence of the target images.
Experimental results on several different datasets validate the effectiveness of MODIFY for unsupervised face stylization.
arXiv Detail & Related papers (2023-03-17T08:35:17Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.