NeRV360: Neural Representation for 360-Degree Videos with a Viewport Decoder
- URL: http://arxiv.org/abs/2512.20871v1
- Date: Wed, 24 Dec 2025 01:21:25 GMT
- Title: NeRV360: Neural Representation for 360-Degree Videos with a Viewport Decoder
- Authors: Daichi Arai, Kyohei Unno, Yasuko Sugito, Yuichi Kusakabe,
- Abstract summary: Implicit neural representations for videos (NeRV) have shown strong potential for video compression.<n>We propose NeRV360, an end-to-end framework that decodes only the user-selected viewport instead of reconstructing the entire panoramic frame.<n>NeRV360 achieves a 7-fold reduction in memory consumption and a 2.5-fold increase in decoding speed compared to HNeRV.
- Score: 1.8149327897427234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Implicit neural representations for videos (NeRV) have shown strong potential for video compression. However, applying NeRV to high-resolution 360-degree videos causes high memory usage and slow decoding, making real-time applications impractical. We propose NeRV360, an end-to-end framework that decodes only the user-selected viewport instead of reconstructing the entire panoramic frame. Unlike conventional pipelines, NeRV360 integrates viewport extraction into decoding and introduces a spatial-temporal affine transform module for conditional decoding based on viewpoint and time. Experiments on 6K-resolution videos show that NeRV360 achieves a 7-fold reduction in memory consumption and a 2.5-fold increase in decoding speed compared to HNeRV, a representative prior work, while delivering better image quality in terms of objective metrics.
Related papers
- TeCoNeRV: Leveraging Temporal Coherence for Compressible Neural Representations for Videos [51.99176811574457]
Implicit Neural Representations (INRs) have recently demonstrated impressive performance for video compression.<n>However, scaling to high-resolution videos while maintaining encoding efficiency remains a significant challenge.<n>We address these fundamental limitations through three key contributions.<n>We are the first hypernetwork approach to demonstrate results at 480p, 720p and 1080p on UVG, HEVC and MCL-JCV.
arXiv Detail & Related papers (2026-02-18T18:59:55Z) - Omnidirectional Video Super-Resolution using Deep Learning [3.281128493853064]
The limited spatial resolution in 360deg videos does not allow for each degree of view to be represented with adequate pixels.<n>This paper proposes a novel deep learning model for 360deg Video Super-Resolution (360deg VSR) called Spherical Signal Super-resolution with a Proportioned optimisation (S3PO)<n>S3PO adopts recurrent modelling with an attention mechanism, unbound from conventional VSR techniques like alignment.
arXiv Detail & Related papers (2025-06-03T05:59:21Z) - Fast Encoding and Decoding for Implicit Video Representation [88.43612845776265]
We introduce NeRV-Enc, a transformer-based hyper-network for fast encoding; and NeRV-Dec, a parallel decoder for efficient video loading.
NeRV-Enc achieves an impressive speed-up of $mathbf104times$ by eliminating gradient-based optimization.
NeRV-Dec simplifies video decoding, outperforming conventional codecs with a loading speed $mathbf11times$ faster.
arXiv Detail & Related papers (2024-09-28T18:21:52Z) - HNeRV: A Hybrid Neural Representation for Videos [56.492309149698606]
Implicit neural representations store videos as neural networks.
We propose a Hybrid Neural Representation for Videos (HNeRV)
With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks.
arXiv Detail & Related papers (2023-04-05T17:55:04Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - CNeRV: Content-adaptive Neural Representation for Visual Data [54.99373641890767]
We propose Neural Visual Representation with Content-adaptive Embedding (CNeRV), which combines the generalizability of autoencoders with the simplicity and compactness of implicit representation.
We match the performance of NeRV, a state-of-the-art implicit neural representation, on the reconstruction task for frames seen during training while far surpassing for frames that are skipped during training (unseen images)
With the same latent code length and similar model size, CNeRV outperforms autoencoders on reconstruction of both seen and unseen images.
arXiv Detail & Related papers (2022-11-18T18:35:43Z) - Panoramic Vision Transformer for Saliency Detection in 360{\deg} Videos [48.54829780502176]
We present a new framework named Panoramic Vision Transformer (PAVER)
We design the encoder using Vision Transformer with deformable convolution, which enables us to plug pretrained models from normal videos into our architecture without additional modules or finetuning.
We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision.
arXiv Detail & Related papers (2022-09-19T12:23:34Z) - NeRV: Neural Representations for Videos [36.00198388959609]
We propose a novel neural representation for videos (NeRV) which encodes videos in neural networks.
NeRV is simply fitting a neural network to video frames and decoding process is a simple feedforward operation.
With such a representation, we can treat videos as neural networks, simplifying several video-related tasks.
arXiv Detail & Related papers (2021-10-26T17:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.