DreamStyle: A Unified Framework for Video Stylization
- URL: http://arxiv.org/abs/2601.02785v1
- Date: Tue, 06 Jan 2026 07:42:12 GMT
- Title: DreamStyle: A Unified Framework for Video Stylization
- Authors: Mengtian Li, Jinshu Chen, Songtao Zhao, Wanquan Feng, Pengqi Tu, Qian He,
- Abstract summary: We introduce DreamStyle, a unified framework for video stylization.<n>It supports (1) text-guided, (2) style-image-guided, and (3) first-frame-guided video stylization.<n>Both qualitative and quantitative evaluations demonstrate that DreamStyle is competent in all three video stylization tasks.
- Score: 18.820518165759403
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Video stylization, an important downstream task of video generation models, has not yet been thoroughly explored. Its input style conditions typically include text, style image, and stylized first frame. Each condition has a characteristic advantage: text is more flexible, style image provides a more accurate visual anchor, and stylized first frame makes long-video stylization feasible. However, existing methods are largely confined to a single type of style condition, which limits their scope of application. Additionally, their lack of high-quality datasets leads to style inconsistency and temporal flicker. To address these limitations, we introduce DreamStyle, a unified framework for video stylization, supporting (1) text-guided, (2) style-image-guided, and (3) first-frame-guided video stylization, accompanied by a well-designed data curation pipeline to acquire high-quality paired video data. DreamStyle is built on a vanilla Image-to-Video (I2V) model and trained using a Low-Rank Adaptation (LoRA) with token-specific up matrices that reduces the confusion among different condition tokens. Both qualitative and quantitative evaluations demonstrate that DreamStyle is competent in all three video stylization tasks, and outperforms the competitors in style consistency and video quality.
Related papers
- TeleStyle: Content-Preserving Style Transfer in Images and Videos [52.76027947278353]
We present TeleStyle, a lightweight model for both image and video stylization.<n>We curated a high-quality dataset of distinct specific styles and synthesized triplets using thousands of diverse, in-the-wild style categories.<n>TeleStyle achieves state-of-the-art performance across three core evaluation metrics: style similarity, content consistency, and aesthetic quality.
arXiv Detail & Related papers (2026-01-28T02:16:03Z) - FreeViS: Training-free Video Stylization with Inconsistent References [57.411689597435334]
FreeViS is a training-free video stylization framework that generates stylized videos with rich style details and strong temporal coherence.<n>Our method integrates multiple stylized references to a pretrained image-to-video (I2V) model, effectively mitigating the propagation errors observed in prior works.
arXiv Detail & Related papers (2025-10-02T05:27:06Z) - SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models [54.641809532055916]
We introduce SOYO, a novel diffusion-based framework for video style morphing.<n>Our method employs a pre-trained text-to-image diffusion model without fine-tuning, combining attention injection and AdaIN to preserve structural consistency.<n>To harmonize across video frames, we propose a novel adaptive sampling scheduler between two style images.
arXiv Detail & Related papers (2025-03-10T07:27:01Z) - StyleMaster: Stylize Your Video with Artistic Generation and Translation [43.808656030545556]
Style control has been popular in video generation models.<n>Current methods often generate videos far from the given style, cause content leakage, and struggle to transfer one video to the desired style.<n>Our approach, StyleMaster, achieves significant improvement in both style resemblance and temporal coherence.
arXiv Detail & Related papers (2024-12-10T18:44:08Z) - FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models [11.401299303276016]
We introduce FreeStyle, an innovative style transfer method built upon a pre-trained large diffusion model.
Our method enables style transfer only through a text description of the desired style, eliminating the necessity of style images.
Our experimental results demonstrate high-quality synthesis and fidelity of our method across various content images and style text prompts.
arXiv Detail & Related papers (2024-01-28T12:00:31Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter [78.75422651890776]
StyleCrafter is a generic method that enhances pre-trained T2V models with a style control adapter.
To promote content-style disentanglement, we remove style descriptions from the text prompt and extract style information solely from the reference image.
StyleCrafter efficiently generates high-quality stylized videos that align with the content of the texts and resemble the style of the reference images.
arXiv Detail & Related papers (2023-12-01T03:53:21Z) - StyleAdapter: A Unified Stylized Image Generation Model [97.24936247688824]
StyleAdapter is a unified stylized image generation model capable of producing a variety of stylized images.
It can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet.
arXiv Detail & Related papers (2023-09-04T19:16:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.