Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos
- URL: http://arxiv.org/abs/2008.01652v1
- Date: Sun, 2 Aug 2020 04:38:59 GMT
- Title: Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos
- Authors: Yanhui Guo, Xi Zhang, Xiaolin Wu
- Abstract summary: We propose a novel deep multi-modality neural network for restoring very low bit rate videos of talking heads.
The proposed CNN method exploits the correlations among three modalities, video, audio and emotion state of the speaker, to remove the video compression artifacts.
- Score: 23.83907055654182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel deep multi-modality neural network for restoring very low
bit rate videos of talking heads. Such video contents are very common in social
media, teleconferencing, distance education, tele-medicine, etc., and often
need to be transmitted with limited bandwidth. The proposed CNN method exploits
the correlations among three modalities, video, audio and emotion state of the
speaker, to remove the video compression artifacts caused by spatial down
sampling and quantization. The deep learning approach turns out to be ideally
suited for the video restoration task, as the complex non-linear cross-modality
correlations are very difficult to model analytically and explicitly. The new
method is a video post processor that can significantly boost the perceptual
quality of aggressively compressed talking head videos, while being fully
compatible with all existing video compression standards.
Related papers
- When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video
Conferencing via Implicit Radiance Fields [42.926554334378984]
High fidelity and low bandwidth are two major objectives of video compression for video conferencing applications.
We propose a novel low bandwidth neural compression approach for high-fidelity portrait video conferencing.
arXiv Detail & Related papers (2024-02-26T14:29:13Z) - Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities [67.89368528234394]
One of the main challenges of multimodal learning is the need to combine heterogeneous modalities.
Video and audio are obtained at much higher rates than text and are roughly aligned in time.
Our approach achieves the state-of-the-art on well established multimodal benchmarks, outperforming much larger models.
arXiv Detail & Related papers (2023-11-09T19:15:12Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - Gemino: Practical and Robust Neural Compression for Video Conferencing [19.137804113000474]
Gemino is a new neural compression system for video conferencing based on a novel high-frequency-conditional super-resolution pipeline.
We show that Gemino operates on videos in real-time on a Titan X GPU, and achieves 2.2-5x lower than traditional video codecs for the same perceptual quality.
arXiv Detail & Related papers (2022-09-21T17:10:46Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed
Video Quality Enhancement [74.1052624663082]
We develop a deep learning architecture capable of restoring detail to compressed videos.
We show that this improves restoration accuracy compared to prior compression correction methods.
We condition our model on quantization data which is readily available in the bitstream.
arXiv Detail & Related papers (2022-01-31T18:56:04Z) - Multi-modality Deep Restoration of Extremely Compressed Face Videos [36.83490465562509]
We develop a multi-modality deep convolutional neural network method for restoring face videos that are aggressively compressed.
The main innovation is a new DCNN architecture that incorporates known priors of multiple modalities.
Ample empirical evidences are presented to validate the superior performance of the proposed DCNN method on face videos.
arXiv Detail & Related papers (2021-07-05T16:29:02Z) - Ultra-low bitrate video conferencing using deep image animation [7.263312285502382]
We propose a novel deep learning approach for ultra-low video compression for video conferencing applications.
We employ deep neural networks to encode motion information as keypoint displacement and reconstruct the video signal at the decoder side.
arXiv Detail & Related papers (2020-12-01T09:06:34Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.