Immersive Video Compression using Implicit Neural Representations
- URL: http://arxiv.org/abs/2402.01596v2
- Date: Fri, 23 Feb 2024 12:26:24 GMT
- Title: Immersive Video Compression using Implicit Neural Representations
- Authors: Ho Man Kwan, Fan Zhang, Andrew Gower, David Bull
- Abstract summary: MV-HiNeRV is an enhanced version of a state-of-the-art INR-based video, HiNeRV.
We have modified the model to learn a different group of feature grids for each view, and share the learnt network parameters among all views.
The proposed was used to compress multiview texture and depth video in the MPEG Immersive Video (MIV) Common Test Conditions.
The results demonstrate the superior performance of MV-HiNeRV, superior with significant coding gains (up to 72.33%) over TMIV.
- Score: 4.13899730757205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work on implicit neural representations (INRs) has evidenced their
potential for efficiently representing and encoding conventional video content.
In this paper we, for the first time, extend their application to immersive
(multi-view) videos, by proposing MV-HiNeRV, a new INR-based immersive video
codec. MV-HiNeRV is an enhanced version of a state-of-the-art INR-based video
codec, HiNeRV, which was developed for single-view video compression. We have
modified the model to learn a different group of feature grids for each view,
and share the learnt network parameters among all views. This enables the model
to effectively exploit the spatio-temporal and the inter-view redundancy that
exists within multi-view videos. The proposed codec was used to compress
multi-view texture and depth video sequences in the MPEG Immersive Video (MIV)
Common Test Conditions, and tested against the MIV Test model (TMIV) that uses
the VVenC video codec. The results demonstrate the superior performance of
MV-HiNeRV, with significant coding gains (up to 72.33\%) over TMIV. The
implementation of MV-HiNeRV is published for further development and
evaluation.
Related papers
- MSNeRV: Neural Video Representation with Multi-Scale Feature Fusion [27.621656985302973]
Implicit Neural representations (INRs) have emerged as a promising approach for video compression.<n>Existing INR-based methods struggle to effectively represent detail-intensive and fast-changing video content.<n>We propose a multi-scale feature fusion framework, MSNeRV, for neural video representation.
arXiv Detail & Related papers (2025-06-18T08:57:12Z) - CANeRV: Content Adaptive Neural Representation for Video Compression [89.35616046528624]
We propose Content Adaptive Neural Representation for Video Compression (CANeRV)
CANeRV is an innovative INR-based video compression network that adaptively conducts structure optimisation based on the specific content of each video sequence.
We show that CANeRV can outperform both H.266/VVC and state-of-the-art INR-based video compression techniques across diverse video datasets.
arXiv Detail & Related papers (2025-02-10T06:21:16Z) - NVRC: Neural Video Representation Compression [13.131842990481038]
We propose a novel INR-based video compression framework, Neural Video Representation Compression (NVRC)
NVRC, for the first time, is able to optimize an INR-based video in a fully end-to-end manner.
Our experiments show that NVRC outperforms many conventional and learning-based benchmark entropy.
arXiv Detail & Related papers (2024-09-11T16:57:12Z) - When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - MNeRV: A Multilayer Neural Representation for Videos [1.1079931610880582]
We propose a multilayer neural representation for videos (MNeRV) and design a new decoder M-Decoder and its matching encoder M-Encoder.
MNeRV has more encoding and decoding layers, which effectively alleviates the problem of redundant model parameters.
In the field of video regression reconstruction, we achieve better reconstruction quality (+4.06 PSNR) with fewer parameters.
arXiv Detail & Related papers (2024-07-10T03:57:29Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - HiNeRV: Video Compression with Hierarchical Encoding-based Neural
Representation [14.088444622391501]
Implicit Representations (INRs) have previously been used to represent and compress image and video content.
Existing INR-based methods have failed to deliver rate quality performance comparable with the state of the art in video compression.
We propose HiNeRV, an INR that combines light weight layers with hierarchical positional encodings.
arXiv Detail & Related papers (2023-06-16T12:59:52Z) - DNeRV: Modeling Inherent Dynamics via Difference Neural Representation
for Videos [53.077189668346705]
Difference Representation for Videos (eRV)
We analyze this from the perspective of limitation function fitting and the importance of frame difference.
DNeRV achieves competitive results against the state-of-the-art neural compression approaches.
arXiv Detail & Related papers (2023-04-13T13:53:49Z) - HNeRV: A Hybrid Neural Representation for Videos [56.492309149698606]
Implicit neural representations store videos as neural networks.
We propose a Hybrid Neural Representation for Videos (HNeRV)
With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks.
arXiv Detail & Related papers (2023-04-05T17:55:04Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.