Related papers: StyleMaster: Stylize Your Video with Artistic Generation and Translation

StyleMaster: Stylize Your Video with Artistic Generation and Translation

URL: http://arxiv.org/abs/2412.07744v1
Date: Tue, 10 Dec 2024 18:44:08 GMT
Title: StyleMaster: Stylize Your Video with Artistic Generation and Translation
Authors: Zixuan Ye, Huijuan Huang, Xintao Wang, Pengfei Wan, Di Zhang, Wenhan Luo,
Abstract summary: Style control has been popular in video generation models.<n>Current methods often generate videos far from the given style, cause content leakage, and struggle to transfer one video to the desired style.<n>Our approach, StyleMaster, achieves significant improvement in both style resemblance and temporal coherence.
Score: 43.808656030545556
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Style control has been popular in video generation models. Existing methods often generate videos far from the given style, cause content leakage, and struggle to transfer one video to the desired style. Our first observation is that the style extraction stage matters, whereas existing methods emphasize global style but ignore local textures. In order to bring texture features while preventing content leakage, we filter content-related patches while retaining style ones based on prompt-patch similarity; for global style extraction, we generate a paired style dataset through model illusion to facilitate contrastive learning, which greatly enhances the absolute style consistency. Moreover, to fill in the image-to-video gap, we train a lightweight motion adapter on still videos, which implicitly enhances stylization extent, and enables our image-trained model to be seamlessly applied to videos. Benefited from these efforts, our approach, StyleMaster, not only achieves significant improvement in both style resemblance and temporal coherence, but also can easily generalize to video style transfer with a gray tile ControlNet. Extensive experiments and visualizations demonstrate that StyleMaster significantly outperforms competitors, effectively generating high-quality stylized videos that align with textual content and closely resemble the style of reference images. Our project page is at https://zixuan-ye.github.io/stylemaster

Related papers

TeleStyle: Content-Preserving Style Transfer in Images and Videos [52.76027947278353]
We present TeleStyle, a lightweight model for both image and video stylization.<n>We curated a high-quality dataset of distinct specific styles and synthesized triplets using thousands of diverse, in-the-wild style categories.<n>TeleStyle achieves state-of-the-art performance across three core evaluation metrics: style similarity, content consistency, and aesthetic quality.
arXiv Detail & Related papers (2026-01-28T02:16:03Z)
DreamStyle: A Unified Framework for Video Stylization [18.820518165759403]
We introduce DreamStyle, a unified framework for video stylization.<n>It supports (1) text-guided, (2) style-image-guided, and (3) first-frame-guided video stylization.<n>Both qualitative and quantitative evaluations demonstrate that DreamStyle is competent in all three video stylization tasks.
arXiv Detail & Related papers (2026-01-06T07:42:12Z)
PickStyle: Video-to-Video Style Transfer with Context-Style Adapters [1.9039773121452204]
PickStyle is a video-to-video style transfer framework that augments pretrained video diffusion backbones with style adapters.<n>To bridge the gap between static image supervision and dynamic video, we construct synthetic training clips from paired images.<n>CS-CFG ensures that context is preserved in generated video while the style is effectively transferred.
arXiv Detail & Related papers (2025-10-08T21:02:55Z)
FreeViS: Training-free Video Stylization with Inconsistent References [57.411689597435334]
FreeViS is a training-free video stylization framework that generates stylized videos with rich style details and strong temporal coherence.<n>Our method integrates multiple stylized references to a pretrained image-to-video (I2V) model, effectively mitigating the propagation errors observed in prior works.
arXiv Detail & Related papers (2025-10-02T05:27:06Z)
Break Stylistic Sophon: Are We Really Meant to Confine the Imagination in Style Transfer? [12.2238770989173]
StyleWallfacer is a groundbreaking unified training and inference framework.<n>It addresses various issues encountered in the style transfer process of traditional methods.<n>It delivers artist-level style transfer and text-driven stylization.
arXiv Detail & Related papers (2025-06-18T00:24:29Z)
SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models [54.641809532055916]
We introduce SOYO, a novel diffusion-based framework for video style morphing. Our method employs a pre-trained text-to-image diffusion model without fine-tuning, combining attention injection and AdaIN to preserve structural consistency. To harmonize across video frames, we propose a novel adaptive sampling scheduler between two style images.
arXiv Detail & Related papers (2025-03-10T07:27:01Z)
Artist: Aesthetically Controllable Text-Driven Stylization without Training [19.5597806965592]
We introduce textbfArtist, a training-free approach that aesthetically controls the content and style generation of a pretrained diffusion model for text-driven stylization. Our key insight is to disentangle the denoising of content and style into separate diffusion processes while sharing information between them. Our method excels at achieving aesthetic-level stylization requirements, preserving intricate details in the content image and aligning well with the style prompt.
arXiv Detail & Related papers (2024-07-22T17:58:05Z)
StyleShot: A Snapshot on Any Style [20.41380860802149]
We show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning. We achieve this through constructing a style-aware encoder and a well-organized style dataset called StyleGallery. We highlight that, our approach, named StyleShot, is simple yet effective in mimicking various desired styles, without test-time tuning.
arXiv Detail & Related papers (2024-07-01T16:05:18Z)
InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation [4.1177497612346]
Style transfer is an inventive process designed to create an image that maintains the essence of the original while embracing the visual style of another. We introduce InstantStyle-Plus, an approach that prioritizes the integrity of the original content while seamlessly integrating the target style.
arXiv Detail & Related papers (2024-06-30T18:05:33Z)
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter [78.75422651890776]
StyleCrafter is a generic method that enhances pre-trained T2V models with a style control adapter. To promote content-style disentanglement, we remove style descriptions from the text prompt and extract style information solely from the reference image. StyleCrafter efficiently generates high-quality stylized videos that align with the content of the texts and resemble the style of the reference images.
arXiv Detail & Related papers (2023-12-01T03:53:21Z)
StyleAdapter: A Unified Stylized Image Generation Model [97.24936247688824]
StyleAdapter is a unified stylized image generation model capable of producing a variety of stylized images. It can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet.
arXiv Detail & Related papers (2023-09-04T19:16:46Z)
Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences [49.66987347397398]
Few-Shot Stylized Visual Captioning aims to generate captions in any desired style, using only a few examples as guidance during inference. We propose a framework called FS-StyleCap for this task, which utilizes a conditional encoder-decoder language model and a visual projection module.
arXiv Detail & Related papers (2023-07-31T04:26:01Z)
Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer [13.098901971644656]
This paper proposes a zero-shot video stylization method named Style-A-Video. Uses a generative pre-trained transformer with an image latent diffusion model to achieve a concise text-controlled video stylization. Tests show that we can attain superior content preservation and stylistic performance while incurring less consumption than previous solutions.
arXiv Detail & Related papers (2023-05-09T14:03:27Z)
Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning. Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.