Neural Video Compression with Feature Modulation
- URL: http://arxiv.org/abs/2402.17414v2
- Date: Thu, 29 Feb 2024 05:49:21 GMT
- Title: Neural Video Compression with Feature Modulation
- Authors: Jiahao Li, Bin Li, Yan Lu
- Abstract summary: Conditional coding-based neural video (NVC) shows superiority over commonly-used residual coding-based neural video (NVC)
In this paper, we propose a powerful conditional coding-based NVC that solves two critical problems via feature modulation.
- Score: 28.105412445443697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emerging conditional coding-based neural video codec (NVC) shows
superiority over commonly-used residual coding-based codec and the latest NVC
already claims to outperform the best traditional codec. However, there still
exist critical problems blocking the practicality of NVC. In this paper, we
propose a powerful conditional coding-based NVC that solves two critical
problems via feature modulation. The first is how to support a wide quality
range in a single model. Previous NVC with this capability only supports about
3.8 dB PSNR range on average. To tackle this limitation, we modulate the latent
feature of the current frame via the learnable quantization scaler. During the
training, we specially design the uniform quantization parameter sampling
mechanism to improve the harmonization of encoding and quantization. This
results in a better learning of the quantization scaler and helps our NVC
support about 11.4 dB PSNR range. The second is how to make NVC still work
under a long prediction chain. We expose that the previous SOTA NVC has an
obvious quality degradation problem when using a large intra-period setting. To
this end, we propose modulating the temporal feature with a periodically
refreshing mechanism to boost the quality. %Besides solving the above two
problems, we also design a single model that can support both RGB and YUV
colorspaces. Notably, under single intra-frame setting, our codec can achieve
29.7\% bitrate saving over previous SOTA NVC with 16\% MACs reduction. Our
codec serves as a notable landmark in the journey of NVC evolution. The codes
are at https://github.com/microsoft/DCVC.
Related papers
- Towards Practical Real-Time Neural Video Compression [60.390180067626396]
We introduce a practical real-time neural video (NVC) designed to deliver high compression ratio, low latency and broad versatility.
Experiments show our proposed DCVC-RT achieves an impressive average encoding/desampling speed 125.2/112.8 (frames per second) for 1080p video, while saving an average of 21% in fps compared to H.266/VTM.
arXiv Detail & Related papers (2025-02-28T06:32:23Z) - Improving the Diffusability of Autoencoders [54.920783089085035]
Latent diffusion models have emerged as the leading approach for generating high-quality images and videos.
We perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces.
We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality.
arXiv Detail & Related papers (2025-02-20T18:45:44Z) - PNVC: Towards Practical INR-based Video Compression [14.088444622391501]
We propose a novel INR-based coding framework, PNVC, which innovatively combines autoencoder-based and overfitted solutions.
PNVC achieves nearly 35%+ BD-rate savings against HEVC HM 18.0 (LD) - almost 10% more compared to one of the state-of-the-art INR-based codecs.
arXiv Detail & Related papers (2024-09-02T05:31:11Z) - Prediction and Reference Quality Adaptation for Learned Video Compression [54.58691829087094]
We propose a confidence-based prediction quality adaptation (PQA) module to provide explicit discrimination for the spatial and channel-wise prediction quality difference.
We also propose a reference quality adaptation (RQA) module and an associated repeat-long training strategy to provide dynamic spatially variant filters for diverse reference qualities.
arXiv Detail & Related papers (2024-06-20T09:03:26Z) - NERV++: An Enhanced Implicit Neural Video Representation [11.25130799452367]
We introduce neural representations for videos NeRV++, an enhanced implicit neural video representation.
NeRV++ is more straightforward yet effective enhancement over the original NeRV decoder architecture.
We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs.
arXiv Detail & Related papers (2024-02-28T13:00:32Z) - Boosting Neural Representations for Videos with a Conditional Decoder [28.073607937396552]
Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing.
This paper introduces a universal boosting framework for current implicit video representation approaches.
arXiv Detail & Related papers (2024-02-28T08:32:19Z) - TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at
Scale [59.01246141215051]
We analyze the factor that leads to degradation from the perspective of language supervision.
We propose a tunable-free pre-training strategy to retain the generalization ability of the text encoder.
We produce a series of models, dubbed TVTSv2, with up to one billion parameters.
arXiv Detail & Related papers (2023-05-23T15:44:56Z) - HNeRV: A Hybrid Neural Representation for Videos [56.492309149698606]
Implicit neural representations store videos as neural networks.
We propose a Hybrid Neural Representation for Videos (HNeRV)
With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks.
arXiv Detail & Related papers (2023-04-05T17:55:04Z) - Neural Video Compression with Diverse Contexts [25.96187914295921]
This paper proposes increasing the context diversity in both temporal and spatial dimensions.
Experiments show that our codes obtains 23.5% saving over previous SOTA NVC.
arXiv Detail & Related papers (2023-02-28T08:35:50Z) - CNeRV: Content-adaptive Neural Representation for Visual Data [54.99373641890767]
We propose Neural Visual Representation with Content-adaptive Embedding (CNeRV), which combines the generalizability of autoencoders with the simplicity and compactness of implicit representation.
We match the performance of NeRV, a state-of-the-art implicit neural representation, on the reconstruction task for frames seen during training while far surpassing for frames that are skipped during training (unseen images)
With the same latent code length and similar model size, CNeRV outperforms autoencoders on reconstruction of both seen and unseen images.
arXiv Detail & Related papers (2022-11-18T18:35:43Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Neural Video Coding using Multiscale Motion Compensation and
Spatiotemporal Context Model [45.46660511313426]
We propose an end-to-end deep neural video coding framework (NVC)
It uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals.
NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods.
arXiv Detail & Related papers (2020-07-09T06:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.