VCGAN: Video Colorization with Hybrid Generative Adversarial Network
- URL: http://arxiv.org/abs/2104.12357v2
- Date: Sun, 7 May 2023 14:22:31 GMT
- Title: VCGAN: Video Colorization with Hybrid Generative Adversarial Network
- Authors: Yuzhi Zhao, Lai-Man Po, Wing-Yin Yu, Yasar Abbas Ur Rehman, Mengyang
Liu, Yujia Zhang, Weifeng Ou
- Abstract summary: Hybrid Video Colorization with Hybrid Generative Adversarative Network (VCGAN) is an improved approach to colorization using end-to-end learning.
Experimental results demonstrate that VCGAN produces higher-quality and temporally more consistent colorful videos than existing approaches.
- Score: 22.45196398040388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a hybrid recurrent Video Colorization with Hybrid Generative
Adversarial Network (VCGAN), an improved approach to video colorization using
end-to-end learning. The VCGAN addresses two prevalent issues in the video
colorization domain: Temporal consistency and unification of colorization
network and refinement network into a single architecture. To enhance
colorization quality and spatiotemporal consistency, the mainstream of
generator in VCGAN is assisted by two additional networks, i.e., global feature
extractor and placeholder feature extractor, respectively. The global feature
extractor encodes the global semantics of grayscale input to enhance
colorization quality, whereas the placeholder feature extractor acts as a
feedback connection to encode the semantics of the previous colorized frame in
order to maintain spatiotemporal consistency. If changing the input for
placeholder feature extractor as grayscale input, the hybrid VCGAN also has the
potential to perform image colorization. To improve the consistency of far
frames, we propose a dense long-term loss that smooths the temporal disparity
of every two remote frames. Trained with colorization and temporal losses
jointly, VCGAN strikes a good balance between color vividness and video
continuity. Experimental results demonstrate that VCGAN produces higher-quality
and temporally more consistent colorful videos than existing approaches.
Related papers
- Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment [130.15775113897553]
Finsta is a fine-grained structural-temporal alignment learning method.
It consistently improves the existing 13 strong-tuning video-language models.
arXiv Detail & Related papers (2024-06-27T15:23:36Z) - LatentColorization: Latent Diffusion-Based Speaker Video Colorization [1.2641141743223379]
We introduce a novel solution for achieving temporal consistency in video colorization.
We demonstrate strong improvements on established image quality metrics compared to other existing methods.
Our dataset encompasses a combination of conventional datasets and videos from television/movies.
arXiv Detail & Related papers (2024-05-09T12:06:06Z) - Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution [151.1255837803585]
We propose a novel approach, pursuing Spatial Adaptation and Temporal Coherence (SATeCo) for video super-resolution.
SATeCo pivots on learning spatial-temporal guidance from low-resolution videos to calibrate both latent-space high-resolution video denoising and pixel-space video reconstruction.
Experiments conducted on the REDS4 and Vid4 datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-03-25T17:59:26Z) - Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World
Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling.
It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences.
It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z) - Edit Temporal-Consistent Videos with Image Diffusion Model [49.88186997567138]
Large-scale text-to-image (T2I) diffusion models have been extended for text-guided video editing.
T achieves state-of-the-art performance in both video temporal consistency and video editing capability.
arXiv Detail & Related papers (2023-08-17T16:40:55Z) - Histogram-guided Video Colorization Structure with Spatial-Temporal
Connection [10.059070138875038]
Histogram-guided Video Colorization with Spatial-Temporal connection structure (named ST-HVC)
To fully exploit the chroma and motion information, the joint flow and histogram module is tailored to integrate the histogram and flow features.
We show that the developed method achieves excellent performance both quantitatively and qualitatively in two video datasets.
arXiv Detail & Related papers (2023-08-09T11:59:18Z) - Video Colorization with Pre-trained Text-to-Image Diffusion Models [19.807766482434563]
We present ColorDiffuser, an adaptation of a pre-trained text-to-image latent diffusion model for video colorization.
We propose two novel techniques to enhance the temporal coherence and maintain the vividness of colorization across frames.
arXiv Detail & Related papers (2023-06-02T17:58:00Z) - FlowChroma -- A Deep Recurrent Neural Network for Video Colorization [1.0499611180329804]
We develop an automated video colorization framework that minimizes the flickering of colors across frames.
We show that recurrent neural networks can be successfully used to improve color consistency in video colorization.
arXiv Detail & Related papers (2023-05-23T05:41:53Z) - Temporal Consistent Automatic Video Colorization via Semantic
Correspondence [12.107878178519128]
We propose a novel video colorization framework, which combines semantic correspondence into automatic video colorization.
In the NTIRE 2023 Video Colorization Challenge, our method ranks at the 3rd place in Color Distribution Consistency (CDC) Optimization track.
arXiv Detail & Related papers (2023-05-13T12:06:09Z) - BiSTNet: Semantic Image Prior Guided Bidirectional Temporal Feature
Fusion for Deep Exemplar-based Video Colorization [70.14893481468525]
We present an effective BiSTNet to explore colors of reference exemplars and utilize them to help video colorization.
We first establish the semantic correspondence between each frame and the reference exemplars in deep feature space to explore color information from reference exemplars.
We develop a mixed expert block to extract semantic information for modeling the object boundaries of frames so that the semantic image prior can better guide the colorization process.
arXiv Detail & Related papers (2022-12-05T13:47:15Z) - Temporally Consistent Video Colorization with Deep Feature Propagation
and Self-regularization Learning [90.38674162878496]
We propose a novel temporally consistent video colorization framework (TCVC)
TCVC effectively propagates frame-level deep features in a bidirectional way to enhance the temporal consistency of colorization.
Experiments demonstrate that our method can not only obtain visually pleasing colorized video, but also achieve clearly better temporal consistency than state-of-the-art methods.
arXiv Detail & Related papers (2021-10-09T13:00:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.