HiNeRV: Video Compression with Hierarchical Encoding-based Neural
Representation
- URL: http://arxiv.org/abs/2306.09818v3
- Date: Fri, 26 Jan 2024 15:54:39 GMT
- Title: HiNeRV: Video Compression with Hierarchical Encoding-based Neural
Representation
- Authors: Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull
- Abstract summary: Implicit Representations (INRs) have previously been used to represent and compress image and video content.
Existing INR-based methods have failed to deliver rate quality performance comparable with the state of the art in video compression.
We propose HiNeRV, an INR that combines light weight layers with hierarchical positional encodings.
- Score: 14.088444622391501
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning-based video compression is currently a popular research topic,
offering the potential to compete with conventional standard video codecs. In
this context, Implicit Neural Representations (INRs) have previously been used
to represent and compress image and video content, demonstrating relatively
high decoding speed compared to other methods. However, existing INR-based
methods have failed to deliver rate quality performance comparable with the
state of the art in video compression. This is mainly due to the simplicity of
the employed network architectures, which limit their representation
capability. In this paper, we propose HiNeRV, an INR that combines light weight
layers with novel hierarchical positional encodings. We employs depth-wise
convolutional, MLP and interpolation layers to build the deep and wide network
architecture with high capacity. HiNeRV is also a unified representation
encoding videos in both frames and patches at the same time, which offers
higher performance and flexibility than existing methods. We further build a
video codec based on HiNeRV and a refined pipeline for training, pruning and
quantization that can better preserve HiNeRV's performance during lossy model
compression. The proposed method has been evaluated on both UVG and MCL-JCV
datasets for video compression, demonstrating significant improvement over all
existing INRs baselines and competitive performance when compared to
learning-based codecs (72.3% overall bit rate saving over HNeRV and 43.4% over
DCVC on the UVG dataset, measured in PSNR).
Related papers
- NVRC: Neural Video Representation Compression [13.131842990481038]
We propose a novel INR-based video compression framework, Neural Video Representation Compression (NVRC)
NVRC, for the first time, is able to optimize an INR-based video in a fully end-to-end manner.
Our experiments show that NVRC outperforms many conventional and learning-based benchmark entropy.
arXiv Detail & Related papers (2024-09-11T16:57:12Z) - PNVC: Towards Practical INR-based Video Compression [14.088444622391501]
We propose a novel INR-based coding framework, PNVC, which innovatively combines autoencoder-based and overfitted solutions.
PNVC achieves nearly 35%+ BD-rate savings against HEVC HM 18.0 (LD) - almost 10% more compared to one of the state-of-the-art INR-based codecs.
arXiv Detail & Related papers (2024-09-02T05:31:11Z) - NERV++: An Enhanced Implicit Neural Video Representation [11.25130799452367]
We introduce neural representations for videos NeRV++, an enhanced implicit neural video representation.
NeRV++ is more straightforward yet effective enhancement over the original NeRV decoder architecture.
We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs.
arXiv Detail & Related papers (2024-02-28T13:00:32Z) - Boosting Neural Representations for Videos with a Conditional Decoder [28.073607937396552]
Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing.
This paper introduces a universal boosting framework for current implicit video representation approaches.
arXiv Detail & Related papers (2024-02-28T08:32:19Z) - HNeRV: A Hybrid Neural Representation for Videos [56.492309149698606]
Implicit neural representations store videos as neural networks.
We propose a Hybrid Neural Representation for Videos (HNeRV)
With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks.
arXiv Detail & Related papers (2023-04-05T17:55:04Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - NIRVANA: Neural Implicit Representations of Videos with Adaptive
Networks and Autoregressive Patch-wise Modeling [37.51397331485574]
Implicit Neural Representations (INR) have recently shown to be powerful tool for high-quality video compression.
These methods have fixed architectures which do not scale to longer videos or higher resolutions.
We propose NIRVANA, which treats videos as groups of frames and fits separate networks to each group performing patch-wise prediction.
arXiv Detail & Related papers (2022-12-30T08:17:02Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Neural JPEG: End-to-End Image Compression Leveraging a Standard JPEG
Encoder-Decoder [73.48927855855219]
We propose a system that learns to improve the encoding performance by enhancing its internal neural representations on both the encoder and decoder ends.
Experiments demonstrate that our approach successfully improves the rate-distortion performance over JPEG across various quality metrics.
arXiv Detail & Related papers (2022-01-27T20:20:03Z) - Learning for Video Compression with Hierarchical Quality and Recurrent
Enhancement [164.7489982837475]
We propose a Hierarchical Learned Video Compression (HLVC) method with three hierarchical quality layers and a recurrent enhancement network.
In our HLVC approach, the hierarchical quality benefits the coding efficiency, since the high quality information facilitates the compression and enhancement of low quality frames at encoder and decoder sides.
arXiv Detail & Related papers (2020-03-04T09:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.