SNeRV: Spectra-preserving Neural Representation for Video
- URL: http://arxiv.org/abs/2501.01681v1
- Date: Fri, 03 Jan 2025 07:57:38 GMT
- Title: SNeRV: Spectra-preserving Neural Representation for Video
- Authors: Jina Kim, Jihoo Lee, Je-Won Kang,
- Abstract summary: We propose spectra-preserving NeRV (SNeRV) as a novel approach to enhance implicit video representations.
In this paper, we use 2D discrete wavelet transform (DWT) to decompose video into low-frequency (LF) and high-frequency (HF) features.
We demonstrate that SNeRV outperforms existing NeRV models in capturing fine details and achieves enhanced reconstruction.
- Score: 8.978061470104532
- License:
- Abstract: Neural representation for video (NeRV), which employs a neural network to parameterize video signals, introduces a novel methodology in video representations. However, existing NeRV-based methods have difficulty in capturing fine spatial details and motion patterns due to spectral bias, in which a neural network learns high-frequency (HF) components at a slower rate than low-frequency (LF) components. In this paper, we propose spectra-preserving NeRV (SNeRV) as a novel approach to enhance implicit video representations by efficiently handling various frequency components. SNeRV uses 2D discrete wavelet transform (DWT) to decompose video into LF and HF features, preserving spatial structures and directly addressing the spectral bias issue. To balance the compactness, we encode only the LF components, while HF components that include fine textures are generated by a decoder. Specialized modules, including a multi-resolution fusion unit (MFU) and a high-frequency restorer (HFR), are integrated into a backbone to facilitate the representation. Furthermore, we extend SNeRV to effectively capture temporal correlations between adjacent video frames, by casting the extension as additional frequency decomposition to a temporal domain. This approach allows us to embed spatio-temporal LF features into the network, using temporally extended up-sampling blocks (TUBs). Experimental results demonstrate that SNeRV outperforms existing NeRV models in capturing fine details and achieves enhanced reconstruction, making it a promising approach in the field of implicit video representations. The codes are available at https://github.com/qwertja/SNeRV.
Related papers
- BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution [14.082598088990352]
We propose a C-STVSR framework with two key modules tailored to better represent spatial and temporal characteristics of video.
Our approach achieves state-of-the-art PSNR and SSIM performance, showing enhanced spatial details and natural temporal consistency.
arXiv Detail & Related papers (2025-01-19T13:29:41Z) - VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression [59.14355576912495]
NeRF-based video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences.
The substantial data volumes pose significant challenges for storage and transmission.
We propose VRVVC, a novel end-to-end joint variable-rate framework for video compression.
arXiv Detail & Related papers (2024-12-16T01:28:04Z) - Neural Video Representation for Redundancy Reduction and Consistency Preservation [0.0]
Implicit neural representation (INR) embed various signals into neural networks.
We propose a video representation method that generates both the high-frequency and low-frequency components of the frame.
Experimental results demonstrate that our method outperforms the existing HNeRV method, achieving superior results in 96 percent of the videos.
arXiv Detail & Related papers (2024-09-27T07:30:12Z) - VQ-NeRV: A Vector Quantized Neural Representation for Videos [3.6662666629446043]
Implicit neural representations (INR) excel in encoding videos within neural networks, showcasing promise in computer vision tasks like video compression and denoising.
We introduce an advanced U-shaped architecture, Vector Quantized-NeRV (VQ-NeRV), which integrates a novel component--the VQ-NeRV Block.
This block incorporates a codebook mechanism to discretize the network's shallow residual features and inter-frame residual information effectively.
arXiv Detail & Related papers (2024-03-19T03:19:07Z) - Boosting Neural Representations for Videos with a Conditional Decoder [28.073607937396552]
Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing.
This paper introduces a universal boosting framework for current implicit video representation approaches.
arXiv Detail & Related papers (2024-02-28T08:32:19Z) - Progressive Fourier Neural Representation for Sequential Video
Compilation [75.43041679717376]
Motivated by continual learning, this work investigates how to accumulate and transfer neural implicit representations for multiple complex video data over sequential encoding sessions.
We propose a novel method, Progressive Fourier Neural Representation (PFNR), that aims to find an adaptive and compact sub-module in Fourier space to encode videos in each training session.
We validate our PFNR method on the UVG8/17 and DAVIS50 video sequence benchmarks and achieve impressive performance gains over strong continual learning baselines.
arXiv Detail & Related papers (2023-06-20T06:02:19Z) - Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos [69.22032459870242]
We present a novel technique, Residual Radiance Field or ReRF, as a highly compact neural representation to achieve real-time free-view rendering on long-duration dynamic scenes.
We show such a strategy can handle large motions without sacrificing quality.
Based on ReRF, we design a special FVV that achieves three orders of magnitudes compression rate and provides a companion ReRF player to support online streaming of long-duration FVVs of dynamic scenes.
arXiv Detail & Related papers (2023-04-10T08:36:00Z) - Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in
VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images.
This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z) - PINs: Progressive Implicit Networks for Multi-Scale Neural
Representations [68.73195473089324]
We propose a progressive positional encoding, exposing a hierarchical structure to incremental sets of frequency encodings.
Our model accurately reconstructs scenes with wide frequency bands and learns a scene representation at progressive level of detail.
Experiments on several 2D and 3D datasets show improvements in reconstruction accuracy, representational capacity and training speed compared to baselines.
arXiv Detail & Related papers (2022-02-09T20:33:37Z) - Wavelet-Based Network For High Dynamic Range Imaging [64.66969585951207]
Existing methods, such as optical flow based and end-to-end deep learning based solutions, are error-prone either in detail restoration or ghosting artifacts removal.
In this work, we propose a novel frequency-guided end-to-end deep neural network (FNet) to conduct HDR fusion in the frequency domain, and Wavelet Transform (DWT) is used to decompose inputs into different frequency bands.
The low-frequency signals are used to avoid specific ghosting artifacts, while the high-frequency signals are used for preserving details.
arXiv Detail & Related papers (2021-08-03T12:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.