Gemino: Practical and Robust Neural Compression for Video Conferencing
- URL: http://arxiv.org/abs/2209.10507v4
- Date: Thu, 19 Oct 2023 21:25:28 GMT
- Title: Gemino: Practical and Robust Neural Compression for Video Conferencing
- Authors: Vibhaalakshmi Sivaraman, Pantea Karimi, Vedantha Venkatapathy, Mehrdad
Khani, Sadjad Fouladi, Mohammad Alizadeh, Fr\'edo Durand, Vivienne Sze
- Abstract summary: Gemino is a new neural compression system for video conferencing based on a novel high-frequency-conditional super-resolution pipeline.
We show that Gemino operates on videos in real-time on a Titan X GPU, and achieves 2.2-5x lower than traditional video codecs for the same perceptual quality.
- Score: 19.137804113000474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video conferencing systems suffer from poor user experience when network
conditions deteriorate because current video codecs simply cannot operate at
extremely low bitrates. Recently, several neural alternatives have been
proposed that reconstruct talking head videos at very low bitrates using sparse
representations of each frame such as facial landmark information. However,
these approaches produce poor reconstructions in scenarios with major movement
or occlusions over the course of a call, and do not scale to higher
resolutions. We design Gemino, a new neural compression system for video
conferencing based on a novel high-frequency-conditional super-resolution
pipeline. Gemino upsamples a very low-resolution version of each target frame
while enhancing high-frequency details (e.g., skin texture, hair, etc.) based
on information extracted from a single high-resolution reference image. We use
a multi-scale architecture that runs different components of the model at
different resolutions, allowing it to scale to resolutions comparable to 720p,
and we personalize the model to learn specific details of each person,
achieving much better fidelity at low bitrates. We implement Gemino atop
aiortc, an open-source Python implementation of WebRTC, and show that it
operates on 1024x1024 videos in real-time on a Titan X GPU, and achieves 2.2-5x
lower bitrate than traditional video codecs for the same perceptual quality.
Related papers
- High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency.
Uses hierarchical predictive coding to transform each video frame into multiscale representations.
Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z) - Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video
Conferencing via Implicit Radiance Fields [42.926554334378984]
High fidelity and low bandwidth are two major objectives of video compression for video conferencing applications.
We propose a novel low bandwidth neural compression approach for high-fidelity portrait video conferencing.
arXiv Detail & Related papers (2024-02-26T14:29:13Z) - Video Compression with Arbitrary Rescaling Network [8.489428003916622]
We propose a rate-guided arbitrary rescaling network (RARN) for video resizing before encoding.
The lightweight RARN structure can process FHD (1080p) content at real-time speed (91 FPS) and obtain a considerable rate reduction.
arXiv Detail & Related papers (2023-06-07T07:15:18Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z) - VideoINR: Learning Video Implicit Neural Representation for Continuous
Space-Time Super-Resolution [75.79379734567604]
We show that Video Implicit Neural Representation (VideoINR) can be decoded to videos of arbitrary spatial resolution and frame rate.
We show that VideoINR achieves competitive performances with state-of-the-art STVSR methods on common up-sampling scales.
arXiv Detail & Related papers (2022-06-09T17:45:49Z) - Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed
Video Quality Enhancement [74.1052624663082]
We develop a deep learning architecture capable of restoring detail to compressed videos.
We show that this improves restoration accuracy compared to prior compression correction methods.
We condition our model on quantization data which is readily available in the bitstream.
arXiv Detail & Related papers (2022-01-31T18:56:04Z) - COMISR: Compression-Informed Video Super-Resolution [76.94152284740858]
Most videos on the web or mobile devices are compressed, and the compression can be severe when the bandwidth is limited.
We propose a new compression-informed video super-resolution model to restore high-resolution content without introducing artifacts caused by compression.
arXiv Detail & Related papers (2021-05-04T01:24:44Z) - Efficient Video Compression via Content-Adaptive Super-Resolution [11.6624528293976]
Video compression is a critical component of Internet video delivery.
Recent work has shown that deep learning techniques can rival or outperform human algorithms.
This paper presents a new approach that augments a recent deep learning-based video compression scheme.
arXiv Detail & Related papers (2021-04-06T07:01:06Z) - Ultra-low bitrate video conferencing using deep image animation [7.263312285502382]
We propose a novel deep learning approach for ultra-low video compression for video conferencing applications.
We employ deep neural networks to encode motion information as keypoint displacement and reconstruct the video signal at the decoder side.
arXiv Detail & Related papers (2020-12-01T09:06:34Z) - Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs.
We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.