Related papers: LatentColorization: Latent Diffusion-Based Speaker Video Colorization

LatentColorization: Latent Diffusion-Based Speaker Video Colorization

URL: http://arxiv.org/abs/2405.05707v1
Date: Thu, 9 May 2024 12:06:06 GMT
Title: LatentColorization: Latent Diffusion-Based Speaker Video Colorization
Authors: Rory Ward, Dan Bigioi, Shubhajit Basak, John G. Breslin, Peter Corcoran,
Abstract summary: We introduce a novel solution for achieving temporal consistency in video colorization. We demonstrate strong improvements on established image quality metrics compared to other existing methods. Our dataset encompasses a combination of conventional datasets and videos from television/movies.
Score: 1.2641141743223379
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While current research predominantly focuses on image-based colorization, the domain of video-based colorization remains relatively unexplored. Most existing video colorization techniques operate on a frame-by-frame basis, often overlooking the critical aspect of temporal coherence between successive frames. This approach can result in inconsistencies across frames, leading to undesirable effects like flickering or abrupt color transitions between frames. To address these challenges, we harness the generative capabilities of a fine-tuned latent diffusion model designed specifically for video colorization, introducing a novel solution for achieving temporal consistency in video colorization, as well as demonstrating strong improvements on established image quality metrics compared to other existing methods. Furthermore, we perform a subjective study, where users preferred our approach to the existing state of the art. Our dataset encompasses a combination of conventional datasets and videos from television/movies. In short, by leveraging the power of a fine-tuned latent diffusion-based colorization system with a temporal consistency mechanism, we can improve the performance of automatic video colorization by addressing the challenges of temporal inconsistency. A short demonstration of our results can be seen in some example videos available at https://youtu.be/vDbzsZdFuxM.

Related papers

VidSplice: Towards Coherent Video Inpainting via Explicit Spaced Frame Guidance [57.57195766748601]
VidSplice is a novel framework that guides inpainting process withtemporal cues.<n>We show that VidSplice achieves competitive performance across diverse video inpainting scenarios.
arXiv Detail & Related papers (2025-10-24T13:44:09Z)
VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization [53.35016574938809]
Video colorization aims to transform grayscale videos into vivid color representations while maintaining temporal consistency and structural integrity. Existing video colorization methods often suffer from color bleeding and lack comprehensive control. We introduce VanGogh, a unified multimodal diffusion-based framework for video colorization.
arXiv Detail & Related papers (2025-01-16T12:20:40Z)
DreamColour: Controllable Video Colour Editing without Training [80.90808879991182]
We present a training-free framework that makes precise video colour editing accessible through an intuitive interface. By decoupling spatial and temporal aspects of colour editing, we can better align with users' natural workflow. Our approach matches or exceeds state-of-the-art methods while eliminating the need for training or specialized hardware.
arXiv Detail & Related papers (2024-12-06T16:57:54Z)
L-C4: Language-Based Video Colorization for Creative and Consistent Color [59.069498113050436]
We present Language-based video colorization for Creative and Consistent Colors (L-C4) Our model is built upon a pre-trained cross-modality generative model. We propose temporally deformable attention to prevent flickering or color shifts, and cross-clip fusion to maintain long-term color consistency.
arXiv Detail & Related papers (2024-10-07T12:16:21Z)
LVCD: Reference-based Lineart Video Colorization with Diffusion Models [18.0983825973013]
We propose the first video diffusion framework for reference-based lineart video colorization. We leverage a large-scale pretrained video diffusion model to generate colorized animation videos. Our method is capable of generating high-quality, long temporal-consistent animation videos.
arXiv Detail & Related papers (2024-09-19T17:59:48Z)
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation [85.29772293776395]
We introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint. This enhancement ensures a more consistent transformation of semantically similar content across frames. Our approach involves an explicit update of features to achieve high spatial-temporal consistency with the input video.
arXiv Detail & Related papers (2024-03-19T17:59:18Z)
Control Color: Multimodal Diffusion-based Interactive Image Colorization [81.68817300796644]
Control Color (Ctrl Color) is a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model. We present an effective way to encode user strokes to enable precise local color manipulation. We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring.
arXiv Detail & Related papers (2024-02-16T17:51:13Z)
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
This paper proposes a novel text-guided video-to-video translation framework to adapt image models to videos. Our framework achieves global style and local texture temporal consistency at a low cost.
arXiv Detail & Related papers (2023-06-13T17:52:23Z)
Video Colorization with Pre-trained Text-to-Image Diffusion Models [19.807766482434563]
We present ColorDiffuser, an adaptation of a pre-trained text-to-image latent diffusion model for video colorization. We propose two novel techniques to enhance the temporal coherence and maintain the vividness of colorization across frames.
arXiv Detail & Related papers (2023-06-02T17:58:00Z)
FlowChroma -- A Deep Recurrent Neural Network for Video Colorization [1.0499611180329804]
We develop an automated video colorization framework that minimizes the flickering of colors across frames. We show that recurrent neural networks can be successfully used to improve color consistency in video colorization.
arXiv Detail & Related papers (2023-05-23T05:41:53Z)
Temporal Consistent Automatic Video Colorization via Semantic Correspondence [12.107878178519128]
We propose a novel video colorization framework, which combines semantic correspondence into automatic video colorization. In the NTIRE 2023 Video Colorization Challenge, our method ranks at the 3rd place in Color Distribution Consistency (CDC) Optimization track.
arXiv Detail & Related papers (2023-05-13T12:06:09Z)
BiSTNet: Semantic Image Prior Guided Bidirectional Temporal Feature Fusion for Deep Exemplar-based Video Colorization [70.14893481468525]
We present an effective BiSTNet to explore colors of reference exemplars and utilize them to help video colorization. We first establish the semantic correspondence between each frame and the reference exemplars in deep feature space to explore color information from reference exemplars. We develop a mixed expert block to extract semantic information for modeling the object boundaries of frames so that the semantic image prior can better guide the colorization process.
arXiv Detail & Related papers (2022-12-05T13:47:15Z)
Temporally Consistent Video Colorization with Deep Feature Propagation and Self-regularization Learning [90.38674162878496]
We propose a novel temporally consistent video colorization framework (TCVC) TCVC effectively propagates frame-level deep features in a bidirectional way to enhance the temporal consistency of colorization. Experiments demonstrate that our method can not only obtain visually pleasing colorized video, but also achieve clearly better temporal consistency than state-of-the-art methods.
arXiv Detail & Related papers (2021-10-09T13:00:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.