Towards Scalable Neural Representation for Diverse Videos
- URL: http://arxiv.org/abs/2303.14124v1
- Date: Fri, 24 Mar 2023 16:32:19 GMT
- Title: Towards Scalable Neural Representation for Diverse Videos
- Authors: Bo He, Xitong Yang, Hanyu Wang, Zuxuan Wu, Hao Chen, Shuaiyi Huang,
Yixuan Ren, Ser-Nam Lim, Abhinav Shrivastava
- Abstract summary: Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
- Score: 68.73612099741956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Implicit neural representations (INR) have gained increasing attention in
representing 3D scenes and images, and have been recently applied to encode
videos (e.g., NeRV, E-NeRV). While achieving promising results, existing
INR-based methods are limited to encoding a handful of short videos (e.g.,
seven 5-second videos in the UVG dataset) with redundant visual content,
leading to a model design that fits individual video frames independently and
is not efficiently scalable to a large number of diverse videos. This paper
focuses on developing neural representations for a more practical setup --
encoding long and/or a large number of videos with diverse visual content. We
first show that instead of dividing videos into small subsets and encoding them
with separate models, encoding long and diverse videos jointly with a unified
model achieves better compression results. Based on this observation, we
propose D-NeRV, a novel neural representation framework designed to encode
diverse videos by (i) decoupling clip-specific visual content from motion
information, (ii) introducing temporal reasoning into the implicit neural
network, and (iii) employing the task-oriented flow as intermediate output to
reduce spatial redundancies. Our new model largely surpasses NeRV and
traditional video compression techniques on UCF101 and UVG datasets on the
video compression task. Moreover, when used as an efficient data-loader, D-NeRV
achieves 3%-10% higher accuracy than NeRV on action recognition tasks on the
UCF101 dataset under the same compression ratios.
Related papers
- NERV++: An Enhanced Implicit Neural Video Representation [11.25130799452367]
We introduce neural representations for videos NeRV++, an enhanced implicit neural video representation.
NeRV++ is more straightforward yet effective enhancement over the original NeRV decoder architecture.
We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs.
arXiv Detail & Related papers (2024-02-28T13:00:32Z) - Progressive Fourier Neural Representation for Sequential Video
Compilation [75.43041679717376]
Motivated by continual learning, this work investigates how to accumulate and transfer neural implicit representations for multiple complex video data over sequential encoding sessions.
We propose a novel method, Progressive Fourier Neural Representation (PFNR), that aims to find an adaptive and compact sub-module in Fourier space to encode videos in each training session.
We validate our PFNR method on the UVG8/17 and DAVIS50 video sequence benchmarks and achieve impressive performance gains over strong continual learning baselines.
arXiv Detail & Related papers (2023-06-20T06:02:19Z) - DNeRV: Modeling Inherent Dynamics via Difference Neural Representation
for Videos [53.077189668346705]
Difference Representation for Videos (eRV)
We analyze this from the perspective of limitation function fitting and the importance of frame difference.
DNeRV achieves competitive results against the state-of-the-art neural compression approaches.
arXiv Detail & Related papers (2023-04-13T13:53:49Z) - HNeRV: A Hybrid Neural Representation for Videos [56.492309149698606]
Implicit neural representations store videos as neural networks.
We propose a Hybrid Neural Representation for Videos (HNeRV)
With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks.
arXiv Detail & Related papers (2023-04-05T17:55:04Z) - NIRVANA: Neural Implicit Representations of Videos with Adaptive
Networks and Autoregressive Patch-wise Modeling [37.51397331485574]
Implicit Neural Representations (INR) have recently shown to be powerful tool for high-quality video compression.
These methods have fixed architectures which do not scale to longer videos or higher resolutions.
We propose NIRVANA, which treats videos as groups of frames and fits separate networks to each group performing patch-wise prediction.
arXiv Detail & Related papers (2022-12-30T08:17:02Z) - CNeRV: Content-adaptive Neural Representation for Visual Data [54.99373641890767]
We propose Neural Visual Representation with Content-adaptive Embedding (CNeRV), which combines the generalizability of autoencoders with the simplicity and compactness of implicit representation.
We match the performance of NeRV, a state-of-the-art implicit neural representation, on the reconstruction task for frames seen during training while far surpassing for frames that are skipped during training (unseen images)
With the same latent code length and similar model size, CNeRV outperforms autoencoders on reconstruction of both seen and unseen images.
arXiv Detail & Related papers (2022-11-18T18:35:43Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - NeRV: Neural Representations for Videos [36.00198388959609]
We propose a novel neural representation for videos (NeRV) which encodes videos in neural networks.
NeRV is simply fitting a neural network to video frames and decoding process is a simple feedforward operation.
With such a representation, we can treat videos as neural networks, simplifying several video-related tasks.
arXiv Detail & Related papers (2021-10-26T17:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.