FreeViS: Training-free Video Stylization with Inconsistent References
- URL: http://arxiv.org/abs/2510.01686v1
- Date: Thu, 02 Oct 2025 05:27:06 GMT
- Title: FreeViS: Training-free Video Stylization with Inconsistent References
- Authors: Jiacong Xu, Yiqun Mei, Ke Zhang, Vishal M. Patel,
- Abstract summary: FreeViS is a training-free video stylization framework that generates stylized videos with rich style details and strong temporal coherence.<n>Our method integrates multiple stylized references to a pretrained image-to-video (I2V) model, effectively mitigating the propagation errors observed in prior works.
- Score: 57.411689597435334
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Video stylization plays a key role in content creation, but it remains a challenging problem. Na\"ively applying image stylization frame-by-frame hurts temporal consistency and reduces style richness. Alternatively, training a dedicated video stylization model typically requires paired video data and is computationally expensive. In this paper, we propose FreeViS, a training-free video stylization framework that generates stylized videos with rich style details and strong temporal coherence. Our method integrates multiple stylized references to a pretrained image-to-video (I2V) model, effectively mitigating the propagation errors observed in prior works, without introducing flickers and stutters. In addition, it leverages high-frequency compensation to constrain the content layout and motion, together with flow-based motion cues to preserve style textures in low-saliency regions. Through extensive evaluations, FreeViS delivers higher stylization fidelity and superior temporal consistency, outperforming recent baselines and achieving strong human preference. Our training-free pipeline offers a practical and economic solution for high-quality, temporally coherent video stylization. The code and videos can be accessed via https://xujiacong.github.io/FreeViS/
Related papers
- DreamStyle: A Unified Framework for Video Stylization [18.820518165759403]
We introduce DreamStyle, a unified framework for video stylization.<n>It supports (1) text-guided, (2) style-image-guided, and (3) first-frame-guided video stylization.<n>Both qualitative and quantitative evaluations demonstrate that DreamStyle is competent in all three video stylization tasks.
arXiv Detail & Related papers (2026-01-06T07:42:12Z) - PickStyle: Video-to-Video Style Transfer with Context-Style Adapters [1.9039773121452204]
PickStyle is a video-to-video style transfer framework that augments pretrained video diffusion backbones with style adapters.<n>To bridge the gap between static image supervision and dynamic video, we construct synthetic training clips from paired images.<n>CS-CFG ensures that context is preserved in generated video while the style is effectively transferred.
arXiv Detail & Related papers (2025-10-08T21:02:55Z) - SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models [54.641809532055916]
We introduce SOYO, a novel diffusion-based framework for video style morphing.<n>Our method employs a pre-trained text-to-image diffusion model without fine-tuning, combining attention injection and AdaIN to preserve structural consistency.<n>To harmonize across video frames, we propose a novel adaptive sampling scheduler between two style images.
arXiv Detail & Related papers (2025-03-10T07:27:01Z) - StyleMaster: Stylize Your Video with Artistic Generation and Translation [43.808656030545556]
Style control has been popular in video generation models.<n>Current methods often generate videos far from the given style, cause content leakage, and struggle to transfer one video to the desired style.<n>Our approach, StyleMaster, achieves significant improvement in both style resemblance and temporal coherence.
arXiv Detail & Related papers (2024-12-10T18:44:08Z) - UniVST: A Unified Framework for Training-free Localized Video Style Transfer [102.52552893495475]
This paper presents UniVST, a unified framework for localized video style transfer based on diffusion models.<n>It operates without the need for training, offering a distinct advantage over existing diffusion methods that transfer style across entire videos.
arXiv Detail & Related papers (2024-10-26T05:28:02Z) - WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos.
Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions.
We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion.
Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z) - WAIT: Feature Warping for Animation to Illustration video Translation using GANs [11.968412857420192]
We introduce a new problem for video stylizing where an unordered set of images are used.<n>Most of the video-to-video translation methods are built on an image-to-image translation model.<n>We propose a new generator network with feature warping layers which overcomes the limitations of the previous methods.
arXiv Detail & Related papers (2023-10-07T19:45:24Z) - ControlVideo: Training-free Controllable Text-to-Video Generation [117.06302461557044]
ControlVideo is a framework to enable natural and efficient text-to-video generation.
It generates both short and long videos within several minutes using one NVIDIA 2080Ti.
arXiv Detail & Related papers (2023-05-22T14:48:53Z) - Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style
Transfer [13.098901971644656]
This paper proposes a zero-shot video stylization method named Style-A-Video.
Uses a generative pre-trained transformer with an image latent diffusion model to achieve a concise text-controlled video stylization.
Tests show that we can attain superior content preservation and stylistic performance while incurring less consumption than previous solutions.
arXiv Detail & Related papers (2023-05-09T14:03:27Z) - StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN [70.31913835035206]
We present a novel approach to the video synthesis problem that helps to greatly improve visual quality.
We make use of a pre-trained StyleGAN network, the latent space of which allows control over the appearance of the objects it was trained for.
Our temporal architecture is then trained not on sequences of RGB frames, but on sequences of StyleGAN latent codes.
arXiv Detail & Related papers (2021-07-15T09:58:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.