Related papers: HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced Diffusion Models

HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced Diffusion Models

URL: http://arxiv.org/abs/2401.05870v1
Date: Thu, 11 Jan 2024 12:26:23 GMT
Title: HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced Diffusion Models
Authors: Hanzhang Wang, Haoran Wang, Jinze Yang, Zhongrui Yu, Zeke Xie, Lei Tian, Xinyan Xiao, Junjun Jiang, Xianming Liu, Mingming Sun
Abstract summary: The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video. We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues. A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
Score: 84.12784265734238
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video. Existing methods usually focus on pursuing the balance between style and content, whereas ignoring the significant demand for flexible and customized stylization results and thereby limiting their practical application. To address this critical issue, a novel AST approach namely HiCAST is proposed, which is capable of explicitly customizing the stylization results according to various source of semantic clues. In the specific, our model is constructed based on Latent Diffusion Model (LDM) and elaborately designed to absorb content and style instance as conditions of LDM. It is characterized by introducing of \textit{Style Adapter}, which allows user to flexibly manipulate the output results by aligning multi-level style information and intrinsic knowledge in LDM. Lastly, we further extend our model to perform video AST. A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency in the premise of maintaining stylization strength. Qualitative and quantitative comparisons as well as comprehensive user studies demonstrate that our HiCAST outperforms the existing SoTA methods in generating visually plausible stylization results.

Related papers

AttenST: A Training-Free Attention-Driven Style Transfer Framework with Pre-Trained Diffusion Models [4.364797586362505]
AttenST is a training-free attention-driven style transfer framework. We propose a style-guided self-attention mechanism that conditions self-attention on the reference style. We also introduce a dual-feature cross-attention mechanism to fuse content and style features.
arXiv Detail & Related papers (2025-03-10T13:28:36Z)
StyleRWKV: High-Quality and High-Efficiency Style Transfer with RWKV-like Architecture [29.178246094092202]
Style transfer aims to generate a new image preserving the content but with the artistic representation of the style source. Most of the existing methods are based on Transformers or diffusion models, however, they suffer from quadratic computational complexity and high inference time. We present a novel framework StyleRWKV, to achieve high-quality style transfer with limited memory usage and linear time complexity.
arXiv Detail & Related papers (2024-12-27T09:01:15Z)
UniVST: A Unified Framework for Training-free Localized Video Style Transfer [102.52552893495475]
This paper presents UniVST, a unified framework for localized video style transfer based on diffusion models. It operates without the need for training, offering a distinct advantage over existing diffusion methods that transfer style across entire videos.
arXiv Detail & Related papers (2024-10-26T05:28:02Z)
ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps. We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z)
ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images. We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z)
Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer [19.355744690301403]
We introduce a novel artistic style transfer method based on a pre-trained large-scale diffusion model without any optimization. Our experimental results demonstrate that our proposed method surpasses state-of-the-art methods in both conventional and diffusion-based style transfer baselines.
arXiv Detail & Related papers (2023-12-11T09:53:12Z)
WSAM: Visual Explanations from Style Augmentation as Adversarial Attacker and Their Influence in Image Classification [2.282270386262498]
This paper outlines a style augmentation algorithm using noise-based sampling with addition to improving randomization on a general linear transformation for style transfer. All models not only present incredible robustness against image stylizing but also outperform all previous methods and surpass the state-of-the-art performance for the STL-10 dataset.
arXiv Detail & Related papers (2023-08-29T02:50:36Z)
ArtFusion: Controllable Arbitrary Style Transfer using Dual Conditional Latent Diffusion Models [0.0]
Arbitrary Style Transfer (AST) aims to transform images by adopting the style from any selected artwork. We propose a new approach, ArtFusion, which provides a flexible balance between content and style.
arXiv Detail & Related papers (2023-06-15T17:58:36Z)
MODIFY: Model-driven Face Stylization without Style Images [77.24793103549158]
Existing face stylization methods always acquire the presence of the target (style) domain during the translation process. We propose a new method called MODel-drIven Face stYlization (MODIFY), which relies on the generative model to bypass the dependence of the target images. Experimental results on several different datasets validate the effectiveness of MODIFY for unsupervised face stylization.
arXiv Detail & Related papers (2023-03-17T08:35:17Z)
A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework. We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature. Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z)
Adversarial Style Augmentation for Domain Generalized Urban-Scene Segmentation [120.96012935286913]
We propose a novel adversarial style augmentation approach, which can generate hard stylized images during training. Experiments on two synthetic-to-real semantic segmentation benchmarks demonstrate that AdvStyle can significantly improve the model performance on unseen real domains.
arXiv Detail & Related papers (2022-07-11T14:01:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.