Related papers: DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos

DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos

URL: http://arxiv.org/abs/2304.06544v1
Date: Thu, 13 Apr 2023 13:53:49 GMT
Title: DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos
Authors: Qi Zhao, M. Salman Asif, Zhan Ma
Abstract summary: Difference Representation for Videos (eRV) We analyze this from the perspective of limitation function fitting and the importance of frame difference. DNeRV achieves competitive results against the state-of-the-art neural compression approaches.
Score: 53.077189668346705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing implicit neural representation (INR) methods do not fully exploit spatiotemporal redundancies in videos. Index-based INRs ignore the content-specific spatial features and hybrid INRs ignore the contextual dependency on adjacent frames, leading to poor modeling capability for scenes with large motion or dynamics. We analyze this limitation from the perspective of function fitting and reveal the importance of frame difference. To use explicit motion information, we propose Difference Neural Representation for Videos (DNeRV), which consists of two streams for content and frame difference. We also introduce a collaborative content unit for effective feature fusion. We test DNeRV for video compression, inpainting, and interpolation. DNeRV achieves competitive results against the state-of-the-art neural compression approaches and outperforms existing implicit methods on downstream inpainting and interpolation for $960 \times 1920$ videos.

Related papers

MSNeRV: Neural Video Representation with Multi-Scale Feature Fusion [27.621656985302973]
Implicit Neural representations (INRs) have emerged as a promising approach for video compression.<n>Existing INR-based methods struggle to effectively represent detail-intensive and fast-changing video content.<n>We propose a multi-scale feature fusion framework, MSNeRV, for neural video representation.
arXiv Detail & Related papers (2025-06-18T08:57:12Z)
CANeRV: Content Adaptive Neural Representation for Video Compression [89.35616046528624]
We propose Content Adaptive Neural Representation for Video Compression (CANeRV) CANeRV is an innovative INR-based video compression network that adaptively conducts structure optimisation based on the specific content of each video sequence. We show that CANeRV can outperform both H.266/VVC and state-of-the-art INR-based video compression techniques across diverse video datasets.
arXiv Detail & Related papers (2025-02-10T06:21:16Z)
MetaNeRV: Meta Neural Representations for Videos with Spatial-Temporal Guidance [20.23897961750891]
We propose MetaNeRV, a novel framework for fast NeRV representation for unseen videos. We introduce spatial-temporal guidance to improve the representation capabilities of MetaNeRV.
arXiv Detail & Related papers (2025-01-05T03:12:30Z)
PNeRV: A Polynomial Neural Representation for Videos [28.302862266270093]
Extracting Implicit Neural Representations on video poses unique challenges due to the additional temporal dimension. We introduce Polynomial Neural Representation for Videos (PNeRV) PNeRV mitigates challenges posed by video data in the realm of INRs but opens new avenues for advanced video processing and analysis.
arXiv Detail & Related papers (2024-06-27T16:15:22Z)
NERV++: An Enhanced Implicit Neural Video Representation [11.25130799452367]
We introduce neural representations for videos NeRV++, an enhanced implicit neural video representation. NeRV++ is more straightforward yet effective enhancement over the original NeRV decoder architecture. We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs.
arXiv Detail & Related papers (2024-02-28T13:00:32Z)
HNeRV: A Hybrid Neural Representation for Videos [56.492309149698606]
Implicit neural representations store videos as neural networks. We propose a Hybrid Neural Representation for Videos (HNeRV) With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks.
arXiv Detail & Related papers (2023-04-05T17:55:04Z)
Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images. Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content. This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z)
Modality-Agnostic Variational Compression of Implicit Neural Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR) Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism. After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z)
CNeRV: Content-adaptive Neural Representation for Visual Data [54.99373641890767]
We propose Neural Visual Representation with Content-adaptive Embedding (CNeRV), which combines the generalizability of autoencoders with the simplicity and compactness of implicit representation. We match the performance of NeRV, a state-of-the-art implicit neural representation, on the reconstruction task for frames seen during training while far surpassing for frames that are skipped during training (unseen images) With the same latent code length and similar model size, CNeRV outperforms autoencoders on reconstruction of both seen and unseen images.
arXiv Detail & Related papers (2022-11-18T18:35:43Z)
Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes. We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z)
E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context [14.549945320069892]
We propose E-NeRV, which dramatically expedites NeRV by decomposing the image-wise implicit neural representation into separate spatial and temporal context. We experimentally find that our method can improve the performance to a large extent with fewer parameters, resulting in a more than $8times$ faster speed on convergence.
arXiv Detail & Related papers (2022-07-17T10:16:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.