NIRVANA: Neural Implicit Representations of Videos with Adaptive
Networks and Autoregressive Patch-wise Modeling
- URL: http://arxiv.org/abs/2212.14593v1
- Date: Fri, 30 Dec 2022 08:17:02 GMT
- Title: NIRVANA: Neural Implicit Representations of Videos with Adaptive
Networks and Autoregressive Patch-wise Modeling
- Authors: Shishira R Maiya, Sharath Girish, Max Ehrlich, Hanyu Wang, Kwot Sin
Lee, Patrick Poirson, Pengxiang Wu, Chen Wang, Abhinav Shrivastava
- Abstract summary: Implicit Neural Representations (INR) have recently shown to be powerful tool for high-quality video compression.
These methods have fixed architectures which do not scale to longer videos or higher resolutions.
We propose NIRVANA, which treats videos as groups of frames and fits separate networks to each group performing patch-wise prediction.
- Score: 37.51397331485574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Implicit Neural Representations (INR) have recently shown to be powerful tool
for high-quality video compression. However, existing works are limiting as
they do not explicitly exploit the temporal redundancy in videos, leading to a
long encoding time. Additionally, these methods have fixed architectures which
do not scale to longer videos or higher resolutions. To address these issues,
we propose NIRVANA, which treats videos as groups of frames and fits separate
networks to each group performing patch-wise prediction. This design shares
computation within each group, in the spatial and temporal dimensions,
resulting in reduced encoding time of the video. The video representation is
modeled autoregressively, with networks fit on a current group initialized
using weights from the previous group's model. To further enhance efficiency,
we perform quantization of the network parameters during training, requiring no
post-hoc pruning or quantization. When compared with previous works on the
benchmark UVG dataset, NIRVANA improves encoding quality from 37.36 to 37.70
(in terms of PSNR) and the encoding speed by 12X, while maintaining the same
compression rate. In contrast to prior video INR works which struggle with
larger resolution and longer videos, we show that our algorithm is highly
flexible and scales naturally due to its patch-wise and autoregressive designs.
Moreover, our method achieves variable bitrate compression by adapting to
videos with varying inter-frame motion. NIRVANA achieves 6X decoding speed and
scales well with more GPUs, making it practical for various deployment
scenarios.
Related papers
- NERV++: An Enhanced Implicit Neural Video Representation [11.25130799452367]
We introduce neural representations for videos NeRV++, an enhanced implicit neural video representation.
NeRV++ is more straightforward yet effective enhancement over the original NeRV decoder architecture.
We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs.
arXiv Detail & Related papers (2024-02-28T13:00:32Z) - HiNeRV: Video Compression with Hierarchical Encoding-based Neural
Representation [14.088444622391501]
Implicit Representations (INRs) have previously been used to represent and compress image and video content.
Existing INR-based methods have failed to deliver rate quality performance comparable with the state of the art in video compression.
We propose HiNeRV, an INR that combines light weight layers with hierarchical positional encodings.
arXiv Detail & Related papers (2023-06-16T12:59:52Z) - HNeRV: A Hybrid Neural Representation for Videos [56.492309149698606]
Implicit neural representations store videos as neural networks.
We propose a Hybrid Neural Representation for Videos (HNeRV)
With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks.
arXiv Detail & Related papers (2023-04-05T17:55:04Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos [5.958701846880935]
We propose FFNeRV, a novel method for incorporating flow information into frame-wise representations to exploit the temporal redundancy across the frames in videos.
With model compression techniques, FFNeRV outperforms widely-used standard video codecs (H.264 and HEVC) and performs on par with state-of-the-art video compression algorithms.
arXiv Detail & Related papers (2022-12-23T12:51:42Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Exploring Long- and Short-Range Temporal Information for Learned Video
Compression [54.91301930491466]
We focus on exploiting the unique characteristics of video content and exploring temporal information to enhance compression performance.
For long-range temporal information exploitation, we propose temporal prior that can update continuously within the group of pictures (GOP) during inference.
In that case temporal prior contains valuable temporal information of all decoded images within the current GOP.
In detail, we design a hierarchical structure to achieve multi-scale compensation.
arXiv Detail & Related papers (2022-08-07T15:57:18Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.