Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video
Conferencing via Implicit Radiance Fields
- URL: http://arxiv.org/abs/2402.16599v1
- Date: Mon, 26 Feb 2024 14:29:13 GMT
- Title: Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video
Conferencing via Implicit Radiance Fields
- Authors: Yifei Li, Xiaohong Liu, Yicong Peng, Guangtao Zhai, and Jun Zhou
- Abstract summary: High fidelity and low bandwidth are two major objectives of video compression for video conferencing applications.
We propose a novel low bandwidth neural compression approach for high-fidelity portrait video conferencing.
- Score: 42.926554334378984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video conferencing has caught much more attention recently. High fidelity and
low bandwidth are two major objectives of video compression for video
conferencing applications. Most pioneering methods rely on classic video
compression codec without high-level feature embedding and thus can not reach
the extremely low bandwidth. Recent works instead employ model-based neural
compression to acquire ultra-low bitrates using sparse representations of each
frame such as facial landmark information, while these approaches can not
maintain high fidelity due to 2D image-based warping. In this paper, we propose
a novel low bandwidth neural compression approach for high-fidelity portrait
video conferencing using implicit radiance fields to achieve both major
objectives. We leverage dynamic neural radiance fields to reconstruct
high-fidelity talking head with expression features, which are represented as
frame substitution for transmission. The overall system employs deep model to
encode expression features at the sender and reconstruct portrait at the
receiver with volume rendering as decoder for ultra-low bandwidth. In
particular, with the characteristic of neural radiance fields based model, our
compression approach is resolution-agnostic, which means that the low bandwidth
achieved by our approach is independent of video resolution, while maintaining
fidelity for higher resolution reconstruction. Experimental results demonstrate
that our novel framework can (1) construct ultra-low bandwidth video
conferencing, (2) maintain high fidelity portrait and (3) have better
performance on high-resolution video compression than previous works.
Related papers
- High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency.
Uses hierarchical predictive coding to transform each video frame into multiscale representations.
Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z) - NERV++: An Enhanced Implicit Neural Video Representation [11.25130799452367]
We introduce neural representations for videos NeRV++, an enhanced implicit neural video representation.
NeRV++ is more straightforward yet effective enhancement over the original NeRV decoder architecture.
We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs.
arXiv Detail & Related papers (2024-02-28T13:00:32Z) - Perceptual Quality Improvement in Videoconferencing using
Keyframes-based GAN [28.773037051085318]
We propose a novel GAN-based method for compression artifacts reduction in videoconferencing.
First, we extract multi-scale features from the compressed and reference frames.
Then, our architecture combines these features in a progressive manner according to facial landmarks.
arXiv Detail & Related papers (2023-11-07T16:38:23Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z) - Gemino: Practical and Robust Neural Compression for Video Conferencing [19.137804113000474]
Gemino is a new neural compression system for video conferencing based on a novel high-frequency-conditional super-resolution pipeline.
We show that Gemino operates on videos in real-time on a Titan X GPU, and achieves 2.2-5x lower than traditional video codecs for the same perceptual quality.
arXiv Detail & Related papers (2022-09-21T17:10:46Z) - Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed
Video Quality Enhancement [74.1052624663082]
We develop a deep learning architecture capable of restoring detail to compressed videos.
We show that this improves restoration accuracy compared to prior compression correction methods.
We condition our model on quantization data which is readily available in the bitstream.
arXiv Detail & Related papers (2022-01-31T18:56:04Z) - COMISR: Compression-Informed Video Super-Resolution [76.94152284740858]
Most videos on the web or mobile devices are compressed, and the compression can be severe when the bandwidth is limited.
We propose a new compression-informed video super-resolution model to restore high-resolution content without introducing artifacts caused by compression.
arXiv Detail & Related papers (2021-05-04T01:24:44Z) - AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [55.24336227884039]
We present a novel framework to generate high-fidelity talking head video.
We use neural scene representation networks to bridge the gap between audio input and video output.
Our framework can (1) produce high-fidelity and natural results, and (2) support free adjustment of audio signals, viewing directions, and background images.
arXiv Detail & Related papers (2021-03-20T02:58:13Z) - Ultra-low bitrate video conferencing using deep image animation [7.263312285502382]
We propose a novel deep learning approach for ultra-low video compression for video conferencing applications.
We employ deep neural networks to encode motion information as keypoint displacement and reconstruct the video signal at the decoder side.
arXiv Detail & Related papers (2020-12-01T09:06:34Z) - Learning for Video Compression with Hierarchical Quality and Recurrent
Enhancement [164.7489982837475]
We propose a Hierarchical Learned Video Compression (HLVC) method with three hierarchical quality layers and a recurrent enhancement network.
In our HLVC approach, the hierarchical quality benefits the coding efficiency, since the high quality information facilitates the compression and enhancement of low quality frames at encoder and decoder sides.
arXiv Detail & Related papers (2020-03-04T09:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.