MTC-VAE: Multi-Level Temporal Compression with Content Awareness
- URL: http://arxiv.org/abs/2602.01340v1
- Date: Sun, 01 Feb 2026 17:08:02 GMT
- Title: MTC-VAE: Multi-Level Temporal Compression with Content Awareness
- Authors: Yubo Dong, Linchao Zhu,
- Abstract summary: Latent Video Diffusion Models (LVDMs) rely on Variational Autoencoders (VAEs) to compress videos into compact latent representations.<n>We present a technique to convert fixed compression rate VAEs into models that support multi-level temporal compression.
- Score: 54.85288415164888
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Latent Video Diffusion Models (LVDMs) rely on Variational Autoencoders (VAEs) to compress videos into compact latent representations. For continuous Variational Autoencoders (VAEs), achieving higher compression rates is desirable; yet, the efficiency notably declines when extra sampling layers are added without expanding the dimensions of hidden channels. In this paper, we present a technique to convert fixed compression rate VAEs into models that support multi-level temporal compression, providing a straightforward and minimal fine-tuning approach to counteract performance decline at elevated compression rates.Moreover, we examine how varying compression levels impact model performance over video segments with diverse characteristics, offering empirical evidence on the effectiveness of our proposed approach. We also investigate the integration of our multi-level temporal compression VAE with diffusion-based generative models, DiT, highlighting successful concurrent training and compatibility within these frameworks. This investigation illustrates the potential uses of multi-level temporal compression.
Related papers
- DiffVC-OSD: One-Step Diffusion-based Perceptual Neural Video Compression Framework [45.134271969594614]
We first propose DiffVC-OSD, a One-Step Diffusion-based Perceptual Neural Video Compression framework.<n>We employ an End-to-End Finetuning strategy to improve overall compression performance.
arXiv Detail & Related papers (2025-08-11T06:59:23Z) - Higher fidelity perceptual image and video compression with a latent conditioned residual denoising diffusion model [55.2480439325792]
We propose a hybrid compression scheme optimized for perceptual quality, extending the approach of the CDC model with a decoder network.<n>We achieve up to +2dB PSNR fidelity improvements while maintaining comparable LPIPS and FID perceptual scores when compared with CDC.
arXiv Detail & Related papers (2025-05-19T14:13:14Z) - Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion [28.61304513668606]
ResULIC is a residual-guided ultra lowrate image compression system.<n>It incorporates residual signals into both semantic retrieval and the diffusion-based generation process.<n>It achieves superior objective and subjective performance compared to state-of-the-art diffusion-based methods.
arXiv Detail & Related papers (2025-05-13T06:51:23Z) - Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations.<n> Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations.<n> Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z) - Pathology Image Compression with Pre-trained Autoencoders [52.208181380986524]
Whole Slide Images in digital histopathology pose significant storage, transmission, and computational efficiency challenges.<n>Standard compression methods, such as JPEG, reduce file sizes but fail to preserve fine-grained phenotypic details critical for downstream tasks.<n>In this work, we repurpose autoencoders (AEs) designed for Latent Diffusion Models as an efficient learned compression framework for pathology images.
arXiv Detail & Related papers (2025-03-14T17:01:17Z) - Spatial Degradation-Aware and Temporal Consistent Diffusion Model for Compressed Video Super-Resolution [25.615935776826596]
Due to storage and bandwidth limitations, videos transmitted over the Internet often exhibit low quality, characterized by low-resolution and compression artifacts.<n>Although video super-resolution (VSR) is an efficient video enhancing technique, existing VS methods focus less on compressed videos.<n>We propose a novel method that exploits the priors of pre-trained diffusion models for compressed VSR.
arXiv Detail & Related papers (2025-02-11T08:57:45Z) - Diffusion-based Perceptual Neural Video Compression with Temporal Diffusion Information Reuse [45.134271969594614]
DiffVC is a diffusion-based perceptual neural video compression framework.<n>It integrates foundational diffusion model with the video conditional coding paradigm.<n>We show that our proposed solution delivers excellent performance in both perception metrics and visual quality.
arXiv Detail & Related papers (2025-01-23T10:23:04Z) - Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces [20.860632218272094]
Video tokenizers are essential for latent video diffusion models, converting raw video data into latent spaces for efficient training.<n>We propose an alternative approach to enhance temporal compression.<n>We develop a bootstrapped high-temporal-compression model that progressively trains high-compression blocks atop well-trained lower-compression models.
arXiv Detail & Related papers (2025-01-09T18:55:15Z) - Progressive Learning with Visual Prompt Tuning for Variable-Rate Image
Compression [60.689646881479064]
We propose a progressive learning paradigm for transformer-based variable-rate image compression.
Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively.
Our model outperforms all current variable image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed image compression methods trained from scratch.
arXiv Detail & Related papers (2023-11-23T08:29:32Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.