Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos
- URL: http://arxiv.org/abs/2008.01652v1
- Date: Sun, 2 Aug 2020 04:38:59 GMT
- Title: Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos
- Authors: Yanhui Guo, Xi Zhang, Xiaolin Wu
- Abstract summary: We propose a novel deep multi-modality neural network for restoring very low bit rate videos of talking heads.
The proposed CNN method exploits the correlations among three modalities, video, audio and emotion state of the speaker, to remove the video compression artifacts.
- Score: 23.83907055654182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel deep multi-modality neural network for restoring very low
bit rate videos of talking heads. Such video contents are very common in social
media, teleconferencing, distance education, tele-medicine, etc., and often
need to be transmitted with limited bandwidth. The proposed CNN method exploits
the correlations among three modalities, video, audio and emotion state of the
speaker, to remove the video compression artifacts caused by spatial down
sampling and quantization. The deep learning approach turns out to be ideally
suited for the video restoration task, as the complex non-linear cross-modality
correlations are very difficult to model analytically and explicitly. The new
method is a video post processor that can significantly boost the perceptual
quality of aggressively compressed talking head videos, while being fully
compatible with all existing video compression standards.
Related papers
- One-Click Upgrade from 2D to 3D: Sandwiched RGB-D Video Compression for Stereoscopic Teleconferencing [13.74209129258984]
We propose a new approach to upgrade a 2D video to support stereo RGB-D video compression, by wrapping it with a neural pre- and post-processor pair.
We train the neural pre- and post-processors on a synthetic 4D people dataset, and evaluate it on both synthetic and real-captured stereo RGB-D videos.
Our approach saves about 30% bit-rate compared to a conventional video coding scheme and MV-HEVC at the same level of rendering quality from a novel view.
arXiv Detail & Related papers (2024-04-15T17:56:05Z) - Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video
Conferencing via Implicit Radiance Fields [42.926554334378984]
High fidelity and low bandwidth are two major objectives of video compression for video conferencing applications.
We propose a novel low bandwidth neural compression approach for high-fidelity portrait video conferencing.
arXiv Detail & Related papers (2024-02-26T14:29:13Z) - Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities [67.89368528234394]
One of the main challenges of multimodal learning is the need to combine heterogeneous modalities.
Video and audio are obtained at much higher rates than text and are roughly aligned in time.
Our approach achieves the state-of-the-art on well established multimodal benchmarks, outperforming much larger models.
arXiv Detail & Related papers (2023-11-09T19:15:12Z) - Progressive Fourier Neural Representation for Sequential Video
Compilation [75.43041679717376]
Motivated by continual learning, this work investigates how to accumulate and transfer neural implicit representations for multiple complex video data over sequential encoding sessions.
We propose a novel method, Progressive Fourier Neural Representation (PFNR), that aims to find an adaptive and compact sub-module in Fourier space to encode videos in each training session.
We validate our PFNR method on the UVG8/17 and DAVIS50 video sequence benchmarks and achieve impressive performance gains over strong continual learning baselines.
arXiv Detail & Related papers (2023-06-20T06:02:19Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - Gemino: Practical and Robust Neural Compression for Video Conferencing [19.137804113000474]
Gemino is a new neural compression system for video conferencing based on a novel high-frequency-conditional super-resolution pipeline.
We show that Gemino operates on videos in real-time on a Titan X GPU, and achieves 2.2-5x lower than traditional video codecs for the same perceptual quality.
arXiv Detail & Related papers (2022-09-21T17:10:46Z) - Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval [55.088635195893325]
We propose the first quantized representation learning method for cross-view video retrieval, namely Hybrid Contrastive Quantization (HCQ)
HCQ learns both coarse-grained and fine-grained quantizations with transformers, which provide complementary understandings for texts and videos.
Experiments on three Web video benchmark datasets demonstrate that HCQ achieves competitive performance with state-of-the-art non-compressed retrieval methods.
arXiv Detail & Related papers (2022-02-07T18:04:10Z) - Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed
Video Quality Enhancement [74.1052624663082]
We develop a deep learning architecture capable of restoring detail to compressed videos.
We show that this improves restoration accuracy compared to prior compression correction methods.
We condition our model on quantization data which is readily available in the bitstream.
arXiv Detail & Related papers (2022-01-31T18:56:04Z) - Multi-modality Deep Restoration of Extremely Compressed Face Videos [36.83490465562509]
We develop a multi-modality deep convolutional neural network method for restoring face videos that are aggressively compressed.
The main innovation is a new DCNN architecture that incorporates known priors of multiple modalities.
Ample empirical evidences are presented to validate the superior performance of the proposed DCNN method on face videos.
arXiv Detail & Related papers (2021-07-05T16:29:02Z) - Ultra-low bitrate video conferencing using deep image animation [7.263312285502382]
We propose a novel deep learning approach for ultra-low video compression for video conferencing applications.
We employ deep neural networks to encode motion information as keypoint displacement and reconstruct the video signal at the decoder side.
arXiv Detail & Related papers (2020-12-01T09:06:34Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.