Related papers: Hierarchical Autoregressive Modeling for Neural Video Compression

Hierarchical Autoregressive Modeling for Neural Video Compression

URL: http://arxiv.org/abs/2010.10258v3
Date: Tue, 19 Dec 2023 08:45:50 GMT
Title: Hierarchical Autoregressive Modeling for Neural Video Compression
Authors: Ruihan Yang, Yibo Yang, Joseph Marino, Stephan Mandt
Abstract summary: We view recent neural video compression methods as instances of a generalized temporal autoregressive transform. Comprehensive evaluations on large-scale video data show improved rate-distortion performance over both state-of-the-art neural and conventional video compression methods.
Score: 44.1797885347606
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work by Marino et al. (2020) showed improved performance in sequential density estimation by combining masked autoregressive flows with hierarchical latent variable models. We draw a connection between such autoregressive generative models and the task of lossy video compression. Specifically, we view recent neural video compression methods (Lu et al., 2019; Yang et al., 2020b; Agustssonet al., 2020) as instances of a generalized stochastic temporal autoregressive transform, and propose avenues for enhancement based on this insight. Comprehensive evaluations on large-scale video data show improved rate-distortion performance over both state-of-the-art neural and conventional video compression methods.

Related papers

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion [70.4360995984905]
We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models.<n>It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs.
arXiv Detail & Related papers (2025-06-09T17:59:55Z)
Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations. Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z)
Autoregressive Video Generation without Vector Quantization [90.87907377618747]
We reformulate the video generation problem as a non-quantized autoregressive modeling of temporal frame-by-frame prediction. With the proposed approach, we train a novel video autoregressive model without vector quantization, termed NOVA. Our results demonstrate that NOVA surpasses prior autoregressive video models in data efficiency, inference speed, visual fidelity, and video fluency, even with a much smaller model capacity.
arXiv Detail & Related papers (2024-12-18T18:59:53Z)
Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios [14.48369551534582]
A learning-based approach seeks to minimize the compromise between compression rate and reconstructed image quality. A successful technique consists in introducing a deep hyperprior that operates within a 2-level nested latent variable model. This paper extends this concept by designing a generalized L-level nested generative model with a Markov chain structure.
arXiv Detail & Related papers (2024-06-10T11:00:26Z)
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling. It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences. It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z)
Corner-to-Center Long-range Context Model for Efficient Learned Image Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions. In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z)
Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method [69.868145936998]
Generative coding approaches have been identified as promising alternatives with reasonable perceptual rate-distortion trade-offs. The great diversity of distortion types in spatial and temporal domains, ranging from the traditional hybrid coding frameworks to generative models, present grand challenges in compressed face video quality assessment (VQA) We introduce the large-scale Compressed Face Video Quality Assessment (CFVQA) database, which is the first attempt to systematically understand the perceptual quality and diversified compression distortions in face videos.
arXiv Detail & Related papers (2023-04-14T11:26:09Z)
DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos [53.077189668346705]
Difference Representation for Videos (eRV) We analyze this from the perspective of limitation function fitting and the importance of frame difference. DNeRV achieves competitive results against the state-of-the-art neural compression approaches.
arXiv Detail & Related papers (2023-04-13T13:53:49Z)
Scene Matters: Model-based Deep Video Compression [13.329074811293292]
We propose a model-based video compression (MVC) framework that regards scenes as the fundamental units for video sequences. Our proposed MVC directly models novel intensity variation of the entire video sequence in one scene, seeking non-redundant representations instead of reducing redundancy. Our method achieves up to a 20% reduction compared to the latest video standard H.266 and is more efficient in decoding than existing video coding strategies.
arXiv Detail & Related papers (2023-03-08T13:15:19Z)
HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator [90.74663948713615]
We train an autoregressive latent video prediction model capable of predicting high-fidelity future frames. We produce high-resolution (256x256) videos with minimal modification to existing models.
arXiv Detail & Related papers (2022-09-15T08:41:57Z)
Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set [14.89208053104896]
We introduce a video compression algorithm based on instance-adaptive learning. On each video sequence to be transmitted, we finetune a pretrained compression model. We show that it enables a competitive performance even after reducing the network size by 70%.
arXiv Detail & Related papers (2021-11-19T16:25:34Z)
Insights from Generative Modeling for Neural Video Compression [31.59496634465347]
We present newly proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling. We propose several architectures that yield state-of-the-art video compression performance on high-resolution video. We provide further evidence that the generative modeling viewpoint can advance the neural video coding field.
arXiv Detail & Related papers (2021-07-28T02:19:39Z)
Feedback Recurrent Autoencoder for Video Compression [14.072596106425072]
We propose a new network architecture for learned video compression operating in low latency mode. Our method yields state of the art MS-SSIM/rate performance on the high-resolution UVG dataset.
arXiv Detail & Related papers (2020-04-09T02:58:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.