Insights from Generative Modeling for Neural Video Compression
- URL: http://arxiv.org/abs/2107.13136v2
- Date: Sun, 9 Jul 2023 23:05:59 GMT
- Title: Insights from Generative Modeling for Neural Video Compression
- Authors: Ruihan Yang, Yibo Yang, Joseph Marino, Stephan Mandt
- Abstract summary: We present newly proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling.
We propose several architectures that yield state-of-the-art video compression performance on high-resolution video.
We provide further evidence that the generative modeling viewpoint can advance the neural video coding field.
- Score: 31.59496634465347
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While recent machine learning research has revealed connections between deep
generative models such as VAEs and rate-distortion losses used in learned
compression, most of this work has focused on images. In a similar spirit, we
view recently proposed neural video coding algorithms through the lens of deep
autoregressive and latent variable modeling. We present these codecs as
instances of a generalized stochastic temporal autoregressive transform, and
propose new avenues for further improvements inspired by normalizing flows and
structured priors. We propose several architectures that yield state-of-the-art
video compression performance on high-resolution video and discuss their
tradeoffs and ablations. In particular, we propose (i) improved temporal
autoregressive transforms, (ii) improved entropy models with structured and
temporal dependencies, and (iii) variable bitrate versions of our algorithms.
Since our improvements are compatible with a large class of existing models, we
provide further evidence that the generative modeling viewpoint can advance the
neural video coding field.
Related papers
- Neural Residual Diffusion Models for Deep Scalable Vision Generation [17.931568104324985]
We propose a unified and massively scalable Neural Residual Diffusion Models framework (Neural-RDM)
The proposed neural residual models obtain state-of-the-art scores on image's and video's generative benchmarks.
arXiv Detail & Related papers (2024-06-19T04:57:18Z) - Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations.
We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions.
In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z) - Conditional Generative Modeling for Images, 3D Animations, and Video [4.422441608136163]
dissertation attempts to drive innovation in the field of generative modeling for computer vision.
Research focuses on architectures that offer transformations of noise and visual data, and the application of encoder-decoder architectures for generative tasks and 3D content manipulation.
arXiv Detail & Related papers (2023-10-19T21:10:39Z) - Video Probabilistic Diffusion Models in Projected Latent Space [75.4253202574722]
We propose a novel generative model for videos, coined projected latent video diffusion models (PVDM)
PVDM learns a video distribution in a low-dimensional latent space and thus can be efficiently trained with high-resolution videos under limited resources.
arXiv Detail & Related papers (2023-02-15T14:22:34Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - Contextformer: A Transformer with Spatio-Channel Attention for Context
Modeling in Learned Image Compression [5.152019611975467]
We propose a transformer-based context model ak.a. Contextformer.
We replace the context model of a modern compression framework with the Contextformer and test it on the widely used Kodak image dataset.
Our experimental results show that the proposed model provides up to 10% rate savings compared to the standard Versatile Video Coding (VVC) Test Model (VVC) 9.1.
arXiv Detail & Related papers (2022-03-04T17:29:32Z) - Deep Variational Models for Collaborative Filtering-based Recommender
Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results.
Our proposed models apply the variational concept to injectity in the latent space of the deep architecture.
Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z) - Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling.
We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z) - Hierarchical Autoregressive Modeling for Neural Video Compression [44.1797885347606]
We view recent neural video compression methods as instances of a generalized temporal autoregressive transform.
Comprehensive evaluations on large-scale video data show improved rate-distortion performance over both state-of-the-art neural and conventional video compression methods.
arXiv Detail & Related papers (2020-10-19T03:01:33Z) - Neural Video Coding using Multiscale Motion Compensation and
Spatiotemporal Context Model [45.46660511313426]
We propose an end-to-end deep neural video coding framework (NVC)
It uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals.
NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods.
arXiv Detail & Related papers (2020-07-09T06:15:17Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.