VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization
- URL: http://arxiv.org/abs/2501.09499v1
- Date: Thu, 16 Jan 2025 12:20:40 GMT
- Title: VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization
- Authors: Zixun Fang, Zhiheng Liu, Kai Zhu, Yu Liu, Ka Leong Cheng, Wei Zhai, Yang Cao, Zheng-Jun Zha,
- Abstract summary: Video colorization aims to transform grayscale videos into vivid color representations while maintaining temporal consistency and structural integrity.
Existing video colorization methods often suffer from color bleeding and lack comprehensive control.
We introduce VanGogh, a unified multimodal diffusion-based framework for video colorization.
- Score: 53.35016574938809
- License:
- Abstract: Video colorization aims to transform grayscale videos into vivid color representations while maintaining temporal consistency and structural integrity. Existing video colorization methods often suffer from color bleeding and lack comprehensive control, particularly under complex motion or diverse semantic cues. To this end, we introduce VanGogh, a unified multimodal diffusion-based framework for video colorization. VanGogh tackles these challenges using a Dual Qformer to align and fuse features from multiple modalities, complemented by a depth-guided generation process and an optical flow loss, which help reduce color overflow. Additionally, a color injection strategy and luma channel replacement are implemented to improve generalization and mitigate flickering artifacts. Thanks to this design, users can exercise both global and local control over the generation process, resulting in higher-quality colorized videos. Extensive qualitative and quantitative evaluations, and user studies, demonstrate that VanGogh achieves superior temporal consistency and color fidelity.Project page: https://becauseimbatman0.github.io/VanGogh.
Related papers
- DreamColour: Controllable Video Colour Editing without Training [80.90808879991182]
We present a training-free framework that makes precise video colour editing accessible through an intuitive interface.
By decoupling spatial and temporal aspects of colour editing, we can better align with users' natural workflow.
Our approach matches or exceeds state-of-the-art methods while eliminating the need for training or specialized hardware.
arXiv Detail & Related papers (2024-12-06T16:57:54Z) - L-C4: Language-Based Video Colorization for Creative and Consistent Color [59.069498113050436]
We present Language-based video colorization for Creative and Consistent Colors (L-C4)
Our model is built upon a pre-trained cross-modality generative model.
We propose temporally deformable attention to prevent flickering or color shifts, and cross-clip fusion to maintain long-term color consistency.
arXiv Detail & Related papers (2024-10-07T12:16:21Z) - LVCD: Reference-based Lineart Video Colorization with Diffusion Models [18.0983825973013]
We propose the first video diffusion framework for reference-based lineart video colorization.
We leverage a large-scale pretrained video diffusion model to generate colorized animation videos.
Our method is capable of generating high-quality, long temporal-consistent animation videos.
arXiv Detail & Related papers (2024-09-19T17:59:48Z) - LatentColorization: Latent Diffusion-Based Speaker Video Colorization [1.2641141743223379]
We introduce a novel solution for achieving temporal consistency in video colorization.
We demonstrate strong improvements on established image quality metrics compared to other existing methods.
Our dataset encompasses a combination of conventional datasets and videos from television/movies.
arXiv Detail & Related papers (2024-05-09T12:06:06Z) - Control Color: Multimodal Diffusion-based Interactive Image Colorization [81.68817300796644]
Control Color (Ctrl Color) is a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model.
We present an effective way to encode user strokes to enable precise local color manipulation.
We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring.
arXiv Detail & Related papers (2024-02-16T17:51:13Z) - BiSTNet: Semantic Image Prior Guided Bidirectional Temporal Feature
Fusion for Deep Exemplar-based Video Colorization [70.14893481468525]
We present an effective BiSTNet to explore colors of reference exemplars and utilize them to help video colorization.
We first establish the semantic correspondence between each frame and the reference exemplars in deep feature space to explore color information from reference exemplars.
We develop a mixed expert block to extract semantic information for modeling the object boundaries of frames so that the semantic image prior can better guide the colorization process.
arXiv Detail & Related papers (2022-12-05T13:47:15Z) - Temporally Consistent Video Colorization with Deep Feature Propagation
and Self-regularization Learning [90.38674162878496]
We propose a novel temporally consistent video colorization framework (TCVC)
TCVC effectively propagates frame-level deep features in a bidirectional way to enhance the temporal consistency of colorization.
Experiments demonstrate that our method can not only obtain visually pleasing colorized video, but also achieve clearly better temporal consistency than state-of-the-art methods.
arXiv Detail & Related papers (2021-10-09T13:00:14Z) - VCGAN: Video Colorization with Hybrid Generative Adversarial Network [22.45196398040388]
Hybrid Video Colorization with Hybrid Generative Adversarative Network (VCGAN) is an improved approach to colorization using end-to-end learning.
Experimental results demonstrate that VCGAN produces higher-quality and temporally more consistent colorful videos than existing approaches.
arXiv Detail & Related papers (2021-04-26T05:50:53Z) - DeepRemaster: Temporal Source-Reference Attention Networks for
Comprehensive Video Enhancement [32.679447725129165]
We propose a framework to tackle the entire remastering task semi-interactively.
Our work is based on temporal convolutional neural networks with attention mechanisms trained on videos with data-driven deterioration simulation.
arXiv Detail & Related papers (2020-09-18T08:55:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.