PNeRV: A Polynomial Neural Representation for Videos
- URL: http://arxiv.org/abs/2406.19299v1
- Date: Thu, 27 Jun 2024 16:15:22 GMT
- Title: PNeRV: A Polynomial Neural Representation for Videos
- Authors: Sonam Gupta, Snehal Singh Tomar, Grigorios G Chrysos, Sukhendu Das, A. N. Rajagopalan,
- Abstract summary: Extracting Implicit Neural Representations on video poses unique challenges due to the additional temporal dimension.
We introduce Polynomial Neural Representation for Videos (PNeRV)
PNeRV mitigates challenges posed by video data in the realm of INRs but opens new avenues for advanced video processing and analysis.
- Score: 28.302862266270093
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Extracting Implicit Neural Representations (INRs) on video data poses unique challenges due to the additional temporal dimension. In the context of videos, INRs have predominantly relied on a frame-only parameterization, which sacrifices the spatiotemporal continuity observed in pixel-level (spatial) representations. To mitigate this, we introduce Polynomial Neural Representation for Videos (PNeRV), a parameter-wise efficient, patch-wise INR for videos that preserves spatiotemporal continuity. PNeRV leverages the modeling capabilities of Polynomial Neural Networks to perform the modulation of a continuous spatial (patch) signal with a continuous time (frame) signal. We further propose a custom Hierarchical Patch-wise Spatial Sampling Scheme that ensures spatial continuity while retaining parameter efficiency. We also employ a carefully designed Positional Embedding methodology to further enhance PNeRV's performance. Our extensive experimentation demonstrates that PNeRV outperforms the baselines in conventional Implicit Neural Representation tasks like compression along with downstream applications that require spatiotemporal continuity in the underlying representation. PNeRV not only addresses the challenges posed by video data in the realm of INRs but also opens new avenues for advanced video processing and analysis.
Related papers
- Invertible Neural Warp for NeRF [29.00183106905031]
This paper tackles the simultaneous optimization of pose and Neural Radiance Fields (NeRF)
We propose a novel over parameterized representation that models camera poses as learnable rigid warp functions.
We present results on synthetic and real-world datasets, and demonstrate that our approach outperforms existing baselines in terms of pose estimation and high-fidelity reconstruction.
arXiv Detail & Related papers (2024-07-17T07:14:08Z) - Towards a Sampling Theory for Implicit Neural Representations [0.3222802562733786]
Implicit neural representations (INRs) have emerged as a powerful tool for solving inverse problems in computer and computational imaging.
We show how to recover images from a hidden-layer INR using a generalized form of weight decay regularization.
We empirically assess the probability of achieving exact recovery images realized by low-width single-layer INRs, and illustrate the performance of INR on super-resolution recovery of more realistic continuous domain phantom images.
arXiv Detail & Related papers (2024-05-28T17:53:47Z) - NERV++: An Enhanced Implicit Neural Video Representation [11.25130799452367]
We introduce neural representations for videos NeRV++, an enhanced implicit neural video representation.
NeRV++ is more straightforward yet effective enhancement over the original NeRV decoder architecture.
We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs.
arXiv Detail & Related papers (2024-02-28T13:00:32Z) - NeVRF: Neural Video-based Radiance Fields for Long-duration Sequences [53.8501224122952]
We propose a novel neural video-based radiance fields (NeVRF) representation.
NeVRF marries neural radiance field with image-based rendering to support photo-realistic novel view synthesis on long-duration dynamic inward-looking scenes.
Our experiments demonstrate the effectiveness of NeVRF in enabling long-duration sequence rendering, sequential data reconstruction, and compact data storage.
arXiv Detail & Related papers (2023-12-10T11:14:30Z) - ResFields: Residual Neural Fields for Spatiotemporal Signals [61.44420761752655]
ResFields is a novel class of networks specifically designed to effectively represent complex temporal signals.
We conduct comprehensive analysis of the properties of ResFields and propose a matrix factorization technique to reduce the number of trainable parameters.
We demonstrate the practical utility of ResFields by showcasing its effectiveness in capturing dynamic 3D scenes from sparse RGBD cameras.
arXiv Detail & Related papers (2023-09-06T16:59:36Z) - DNeRV: Modeling Inherent Dynamics via Difference Neural Representation
for Videos [53.077189668346705]
Difference Representation for Videos (eRV)
We analyze this from the perspective of limitation function fitting and the importance of frame difference.
DNeRV achieves competitive results against the state-of-the-art neural compression approaches.
arXiv Detail & Related papers (2023-04-13T13:53:49Z) - Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos [69.22032459870242]
We present a novel technique, Residual Radiance Field or ReRF, as a highly compact neural representation to achieve real-time free-view rendering on long-duration dynamic scenes.
We show such a strategy can handle large motions without sacrificing quality.
Based on ReRF, we design a special FVV that achieves three orders of magnitudes compression rate and provides a companion ReRF player to support online streaming of long-duration FVVs of dynamic scenes.
arXiv Detail & Related papers (2023-04-10T08:36:00Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - E-NeRV: Expedite Neural Video Representation with Disentangled
Spatial-Temporal Context [14.549945320069892]
We propose E-NeRV, which dramatically expedites NeRV by decomposing the image-wise implicit neural representation into separate spatial and temporal context.
We experimentally find that our method can improve the performance to a large extent with fewer parameters, resulting in a more than $8times$ faster speed on convergence.
arXiv Detail & Related papers (2022-07-17T10:16:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.