Neural Video Compression with Feature Modulation
- URL: http://arxiv.org/abs/2402.17414v2
- Date: Thu, 29 Feb 2024 05:49:21 GMT
- Title: Neural Video Compression with Feature Modulation
- Authors: Jiahao Li, Bin Li, Yan Lu
- Abstract summary: Conditional coding-based neural video (NVC) shows superiority over commonly-used residual coding-based neural video (NVC)
In this paper, we propose a powerful conditional coding-based NVC that solves two critical problems via feature modulation.
- Score: 28.105412445443697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emerging conditional coding-based neural video codec (NVC) shows
superiority over commonly-used residual coding-based codec and the latest NVC
already claims to outperform the best traditional codec. However, there still
exist critical problems blocking the practicality of NVC. In this paper, we
propose a powerful conditional coding-based NVC that solves two critical
problems via feature modulation. The first is how to support a wide quality
range in a single model. Previous NVC with this capability only supports about
3.8 dB PSNR range on average. To tackle this limitation, we modulate the latent
feature of the current frame via the learnable quantization scaler. During the
training, we specially design the uniform quantization parameter sampling
mechanism to improve the harmonization of encoding and quantization. This
results in a better learning of the quantization scaler and helps our NVC
support about 11.4 dB PSNR range. The second is how to make NVC still work
under a long prediction chain. We expose that the previous SOTA NVC has an
obvious quality degradation problem when using a large intra-period setting. To
this end, we propose modulating the temporal feature with a periodically
refreshing mechanism to boost the quality. %Besides solving the above two
problems, we also design a single model that can support both RGB and YUV
colorspaces. Notably, under single intra-frame setting, our codec can achieve
29.7\% bitrate saving over previous SOTA NVC with 16\% MACs reduction. Our
codec serves as a notable landmark in the journey of NVC evolution. The codes
are at https://github.com/microsoft/DCVC.
Related papers
- Prediction and Reference Quality Adaptation for Learned Video Compression [54.58691829087094]
We propose a confidence-based prediction quality adaptation (PQA) module to provide explicit discrimination for the spatial and channel-wise prediction quality difference.
We also propose a reference quality adaptation (RQA) module and an associated repeat-long training strategy to provide dynamic spatially variant filters for diverse reference qualities.
arXiv Detail & Related papers (2024-06-20T09:03:26Z) - NERV++: An Enhanced Implicit Neural Video Representation [11.25130799452367]
We introduce neural representations for videos NeRV++, an enhanced implicit neural video representation.
NeRV++ is more straightforward yet effective enhancement over the original NeRV decoder architecture.
We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs.
arXiv Detail & Related papers (2024-02-28T13:00:32Z) - Boosting Neural Representations for Videos with a Conditional Decoder [28.073607937396552]
Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing.
This paper introduces a universal boosting framework for current implicit video representation approaches.
arXiv Detail & Related papers (2024-02-28T08:32:19Z) - TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at
Scale [59.01246141215051]
We analyze the factor that leads to degradation from the perspective of language supervision.
We propose a tunable-free pre-training strategy to retain the generalization ability of the text encoder.
We produce a series of models, dubbed TVTSv2, with up to one billion parameters.
arXiv Detail & Related papers (2023-05-23T15:44:56Z) - HNeRV: A Hybrid Neural Representation for Videos [56.492309149698606]
Implicit neural representations store videos as neural networks.
We propose a Hybrid Neural Representation for Videos (HNeRV)
With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks.
arXiv Detail & Related papers (2023-04-05T17:55:04Z) - Neural Video Compression with Diverse Contexts [25.96187914295921]
This paper proposes increasing the context diversity in both temporal and spatial dimensions.
Experiments show that our codes obtains 23.5% saving over previous SOTA NVC.
arXiv Detail & Related papers (2023-02-28T08:35:50Z) - CNeRV: Content-adaptive Neural Representation for Visual Data [54.99373641890767]
We propose Neural Visual Representation with Content-adaptive Embedding (CNeRV), which combines the generalizability of autoencoders with the simplicity and compactness of implicit representation.
We match the performance of NeRV, a state-of-the-art implicit neural representation, on the reconstruction task for frames seen during training while far surpassing for frames that are skipped during training (unseen images)
With the same latent code length and similar model size, CNeRV outperforms autoencoders on reconstruction of both seen and unseen images.
arXiv Detail & Related papers (2022-11-18T18:35:43Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Neural Video Coding using Multiscale Motion Compensation and
Spatiotemporal Context Model [45.46660511313426]
We propose an end-to-end deep neural video coding framework (NVC)
It uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals.
NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods.
arXiv Detail & Related papers (2020-07-09T06:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.