Multi-modality Deep Restoration of Extremely Compressed Face Videos
- URL: http://arxiv.org/abs/2107.05548v1
- Date: Mon, 5 Jul 2021 16:29:02 GMT
- Title: Multi-modality Deep Restoration of Extremely Compressed Face Videos
- Authors: Xi Zhang and Xiaolin Wu
- Abstract summary: We develop a multi-modality deep convolutional neural network method for restoring face videos that are aggressively compressed.
The main innovation is a new DCNN architecture that incorporates known priors of multiple modalities.
Ample empirical evidences are presented to validate the superior performance of the proposed DCNN method on face videos.
- Score: 36.83490465562509
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Arguably the most common and salient object in daily video communications is
the talking head, as encountered in social media, virtual classrooms,
teleconferences, news broadcasting, talk shows, etc. When communication
bandwidth is limited by network congestions or cost effectiveness, compression
artifacts in talking head videos are inevitable. The resulting video quality
degradation is highly visible and objectionable due to high acuity of human
visual system to faces. To solve this problem, we develop a multi-modality deep
convolutional neural network method for restoring face videos that are
aggressively compressed. The main innovation is a new DCNN architecture that
incorporates known priors of multiple modalities: the video-synchronized speech
signal and semantic elements of the compression code stream, including motion
vectors, code partition map and quantization parameters. These priors strongly
correlate with the latent video and hence they are able to enhance the
capability of deep learning to remove compression artifacts. Ample empirical
evidences are presented to validate the superior performance of the proposed
DCNN method on face videos over the existing state-of-the-art methods.
Related papers
- Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency [36.939731355462264]
This study proposes a novel and efficient blind video face enhancement method.
It restores high-quality videos from their compressed low-quality versions with an effective de-flickering mechanism.
Experiments conducted on the VFHQ-Test dataset demonstrate that our method surpasses the current state-of-the-art blind face video restoration and de-flickering methods on both efficiency and effectiveness.
arXiv Detail & Related papers (2024-11-25T15:14:36Z) - Perceptual Quality Improvement in Videoconferencing using
Keyframes-based GAN [28.773037051085318]
We propose a novel GAN-based method for compression artifacts reduction in videoconferencing.
First, we extract multi-scale features from the compressed and reference frames.
Then, our architecture combines these features in a progressive manner according to facial landmarks.
arXiv Detail & Related papers (2023-11-07T16:38:23Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - Perceptual Quality Assessment of Face Video Compression: A Benchmark and
An Effective Method [69.868145936998]
Generative coding approaches have been identified as promising alternatives with reasonable perceptual rate-distortion trade-offs.
The great diversity of distortion types in spatial and temporal domains, ranging from the traditional hybrid coding frameworks to generative models, present grand challenges in compressed face video quality assessment (VQA)
We introduce the large-scale Compressed Face Video Quality Assessment (CFVQA) database, which is the first attempt to systematically understand the perceptual quality and diversified compression distortions in face videos.
arXiv Detail & Related papers (2023-04-14T11:26:09Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed
Video Quality Enhancement [74.1052624663082]
We develop a deep learning architecture capable of restoring detail to compressed videos.
We show that this improves restoration accuracy compared to prior compression correction methods.
We condition our model on quantization data which is readily available in the bitstream.
arXiv Detail & Related papers (2022-01-31T18:56:04Z) - Stitch it in Time: GAN-Based Facial Editing of Real Videos [38.81306268180105]
We propose a framework for semantic editing of faces in videos, demonstrating significant improvements over the current state-of-the-art.
Our method produces meaningful face manipulations, maintains a higher degree of temporal consistency, and can be applied to challenging, high quality, talking head videos.
arXiv Detail & Related papers (2022-01-20T18:48:20Z) - Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos [23.83907055654182]
We propose a novel deep multi-modality neural network for restoring very low bit rate videos of talking heads.
The proposed CNN method exploits the correlations among three modalities, video, audio and emotion state of the speaker, to remove the video compression artifacts.
arXiv Detail & Related papers (2020-08-02T04:38:59Z) - Neural Human Video Rendering by Learning Dynamic Textures and
Rendering-to-Video Translation [99.64565200170897]
We propose a novel human video synthesis method by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space.
We show several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-01-14T18:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.