E-NeRV: Expedite Neural Video Representation with Disentangled
Spatial-Temporal Context
- URL: http://arxiv.org/abs/2207.08132v1
- Date: Sun, 17 Jul 2022 10:16:47 GMT
- Title: E-NeRV: Expedite Neural Video Representation with Disentangled
Spatial-Temporal Context
- Authors: Zizhang Li, Mengmeng Wang, Huaijin Pi, Kechun Xu, Jianbiao Mei, Yong
Liu
- Abstract summary: We propose E-NeRV, which dramatically expedites NeRV by decomposing the image-wise implicit neural representation into separate spatial and temporal context.
We experimentally find that our method can improve the performance to a large extent with fewer parameters, resulting in a more than $8times$ faster speed on convergence.
- Score: 14.549945320069892
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, the image-wise implicit neural representation of videos, NeRV, has
gained popularity for its promising results and swift speed compared to regular
pixel-wise implicit representations. However, the redundant parameters within
the network structure can cause a large model size when scaling up for
desirable performance. The key reason of this phenomenon is the coupled
formulation of NeRV, which outputs the spatial and temporal information of
video frames directly from the frame index input. In this paper, we propose
E-NeRV, which dramatically expedites NeRV by decomposing the image-wise
implicit neural representation into separate spatial and temporal context.
Under the guidance of this new formulation, our model greatly reduces the
redundant model parameters, while retaining the representation ability. We
experimentally find that our method can improve the performance to a large
extent with fewer parameters, resulting in a more than $8\times$ faster speed
on convergence. Code is available at https://github.com/kyleleey/E-NeRV.
Related papers
- PNeRV: A Polynomial Neural Representation for Videos [28.302862266270093]
Extracting Implicit Neural Representations on video poses unique challenges due to the additional temporal dimension.
We introduce Polynomial Neural Representation for Videos (PNeRV)
PNeRV mitigates challenges posed by video data in the realm of INRs but opens new avenues for advanced video processing and analysis.
arXiv Detail & Related papers (2024-06-27T16:15:22Z) - D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video [53.83936023443193]
This paper contributes to the field by introducing a new synthesis method for dynamic novel view from monocular video, such as smartphone captures.
Our approach represents the as a $textitdynamic neural point cloud$, an implicit time-conditioned point cloud that encodes local geometry and appearance in separate hash-encoded neural feature grids.
arXiv Detail & Related papers (2024-06-14T14:35:44Z) - VQ-NeRV: A Vector Quantized Neural Representation for Videos [3.6662666629446043]
Implicit neural representations (INR) excel in encoding videos within neural networks, showcasing promise in computer vision tasks like video compression and denoising.
We introduce an advanced U-shaped architecture, Vector Quantized-NeRV (VQ-NeRV), which integrates a novel component--the VQ-NeRV Block.
This block incorporates a codebook mechanism to discretize the network's shallow residual features and inter-frame residual information effectively.
arXiv Detail & Related papers (2024-03-19T03:19:07Z) - NERV++: An Enhanced Implicit Neural Video Representation [11.25130799452367]
We introduce neural representations for videos NeRV++, an enhanced implicit neural video representation.
NeRV++ is more straightforward yet effective enhancement over the original NeRV decoder architecture.
We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs.
arXiv Detail & Related papers (2024-02-28T13:00:32Z) - DNeRV: Modeling Inherent Dynamics via Difference Neural Representation
for Videos [53.077189668346705]
Difference Representation for Videos (eRV)
We analyze this from the perspective of limitation function fitting and the importance of frame difference.
DNeRV achieves competitive results against the state-of-the-art neural compression approaches.
arXiv Detail & Related papers (2023-04-13T13:53:49Z) - HNeRV: A Hybrid Neural Representation for Videos [56.492309149698606]
Implicit neural representations store videos as neural networks.
We propose a Hybrid Neural Representation for Videos (HNeRV)
With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks.
arXiv Detail & Related papers (2023-04-05T17:55:04Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - CNeRV: Content-adaptive Neural Representation for Visual Data [54.99373641890767]
We propose Neural Visual Representation with Content-adaptive Embedding (CNeRV), which combines the generalizability of autoencoders with the simplicity and compactness of implicit representation.
We match the performance of NeRV, a state-of-the-art implicit neural representation, on the reconstruction task for frames seen during training while far surpassing for frames that are skipped during training (unseen images)
With the same latent code length and similar model size, CNeRV outperforms autoencoders on reconstruction of both seen and unseen images.
arXiv Detail & Related papers (2022-11-18T18:35:43Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - Neural Residual Flow Fields for Efficient Video Representations [5.904082461511478]
Implicit neural representation (INR) has emerged as a powerful paradigm for representing signals, such as images, videos, 3D shapes, etc.
We propose a novel INR approach to representing and compressing videos by explicitly removing data redundancy.
We show that the proposed method outperforms the baseline methods by a significant margin.
arXiv Detail & Related papers (2022-01-12T06:22:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.