Hierarchical Autoregressive Modeling for Neural Video Compression
- URL: http://arxiv.org/abs/2010.10258v3
- Date: Tue, 19 Dec 2023 08:45:50 GMT
- Title: Hierarchical Autoregressive Modeling for Neural Video Compression
- Authors: Ruihan Yang, Yibo Yang, Joseph Marino, Stephan Mandt
- Abstract summary: We view recent neural video compression methods as instances of a generalized temporal autoregressive transform.
Comprehensive evaluations on large-scale video data show improved rate-distortion performance over both state-of-the-art neural and conventional video compression methods.
- Score: 44.1797885347606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work by Marino et al. (2020) showed improved performance in sequential
density estimation by combining masked autoregressive flows with hierarchical
latent variable models. We draw a connection between such autoregressive
generative models and the task of lossy video compression. Specifically, we
view recent neural video compression methods (Lu et al., 2019; Yang et al.,
2020b; Agustssonet al., 2020) as instances of a generalized stochastic temporal
autoregressive transform, and propose avenues for enhancement based on this
insight. Comprehensive evaluations on large-scale video data show improved
rate-distortion performance over both state-of-the-art neural and conventional
video compression methods.
Related papers
- Autoregressive Video Generation without Vector Quantization [90.87907377618747]
We reformulate the video generation problem as a non-quantized autoregressive modeling of temporal frame-by-frame prediction.
With the proposed approach, we train a novel video autoregressive model without vector quantization, termed NOVA.
Our results demonstrate that NOVA surpasses prior autoregressive video models in data efficiency, inference speed, visual fidelity, and video fluency, even with a much smaller model capacity.
arXiv Detail & Related papers (2024-12-18T18:59:53Z) - Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World
Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling.
It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences.
It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z) - Perceptual Quality Assessment of Face Video Compression: A Benchmark and
An Effective Method [69.868145936998]
Generative coding approaches have been identified as promising alternatives with reasonable perceptual rate-distortion trade-offs.
The great diversity of distortion types in spatial and temporal domains, ranging from the traditional hybrid coding frameworks to generative models, present grand challenges in compressed face video quality assessment (VQA)
We introduce the large-scale Compressed Face Video Quality Assessment (CFVQA) database, which is the first attempt to systematically understand the perceptual quality and diversified compression distortions in face videos.
arXiv Detail & Related papers (2023-04-14T11:26:09Z) - DNeRV: Modeling Inherent Dynamics via Difference Neural Representation
for Videos [53.077189668346705]
Difference Representation for Videos (eRV)
We analyze this from the perspective of limitation function fitting and the importance of frame difference.
DNeRV achieves competitive results against the state-of-the-art neural compression approaches.
arXiv Detail & Related papers (2023-04-13T13:53:49Z) - Scene Matters: Model-based Deep Video Compression [13.329074811293292]
We propose a model-based video compression (MVC) framework that regards scenes as the fundamental units for video sequences.
Our proposed MVC directly models novel intensity variation of the entire video sequence in one scene, seeking non-redundant representations instead of reducing redundancy.
Our method achieves up to a 20% reduction compared to the latest video standard H.266 and is more efficient in decoding than existing video coding strategies.
arXiv Detail & Related papers (2023-03-08T13:15:19Z) - HARP: Autoregressive Latent Video Prediction with High-Fidelity Image
Generator [90.74663948713615]
We train an autoregressive latent video prediction model capable of predicting high-fidelity future frames.
We produce high-resolution (256x256) videos with minimal modification to existing models.
arXiv Detail & Related papers (2022-09-15T08:41:57Z) - Instance-Adaptive Video Compression: Improving Neural Codecs by Training
on the Test Set [14.89208053104896]
We introduce a video compression algorithm based on instance-adaptive learning.
On each video sequence to be transmitted, we finetune a pretrained compression model.
We show that it enables a competitive performance even after reducing the network size by 70%.
arXiv Detail & Related papers (2021-11-19T16:25:34Z) - Insights from Generative Modeling for Neural Video Compression [31.59496634465347]
We present newly proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling.
We propose several architectures that yield state-of-the-art video compression performance on high-resolution video.
We provide further evidence that the generative modeling viewpoint can advance the neural video coding field.
arXiv Detail & Related papers (2021-07-28T02:19:39Z) - Feedback Recurrent Autoencoder for Video Compression [14.072596106425072]
We propose a new network architecture for learned video compression operating in low latency mode.
Our method yields state of the art MS-SSIM/rate performance on the high-resolution UVG dataset.
arXiv Detail & Related papers (2020-04-09T02:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.