Slimmable Video Codec
- URL: http://arxiv.org/abs/2205.06754v1
- Date: Fri, 13 May 2022 16:37:27 GMT
- Title: Slimmable Video Codec
- Authors: Zhaocheng Liu, Luis Herranz, Fei Yang, Saiping Zhang, Shuai Wan, Marta
Mrak and Marc G\'orriz Blanch
- Abstract summary: We propose a slimmable video (SlimVC) by integrating a slimmable temporal entropy model in a slimmable autoencoder.
Despite a significantly more complex architecture, we show that slimming remains a powerful mechanism to control rate, memory footprint, computational cost and latency.
- Score: 24.460763016660685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural video compression has emerged as a novel paradigm combining trainable
multilayer neural networks and machine learning, achieving competitive
rate-distortion (RD) performances, but still remaining impractical due to heavy
neural architectures, with large memory and computational demands. In addition,
models are usually optimized for a single RD tradeoff. Recent slimmable image
codecs can dynamically adjust their model capacity to gracefully reduce the
memory and computation requirements, without harming RD performance. In this
paper we propose a slimmable video codec (SlimVC), by integrating a slimmable
temporal entropy model in a slimmable autoencoder. Despite a significantly more
complex architecture, we show that slimming remains a powerful mechanism to
control rate, memory footprint, computational cost and latency, all being
important requirements for practical video compression.
Related papers
- Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis [50.77548592888096]
Demand for 2K video synthesis is rising with increasing consumer expectations for ultra-clear visuals.
Turbo2K is an efficient framework for generating detail-rich 2K videos.
arXiv Detail & Related papers (2025-04-20T03:30:59Z) - QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation [84.91431271257437]
Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation.
DiTs come with significant drawbacks, including increased computational and memory costs.
We propose QuantCache, a novel training-free inference acceleration framework.
arXiv Detail & Related papers (2025-03-09T10:31:51Z) - Token-Efficient Long Video Understanding for Multimodal LLMs [101.70681093383365]
STORM is a novel architecture incorporating a dedicated temporal encoder between the image encoder and the Video-LLMs.
We show that STORM achieves state-of-the-art results across various long video understanding benchmarks.
arXiv Detail & Related papers (2025-03-06T06:17:38Z) - UAR-NVC: A Unified AutoRegressive Framework for Memory-Efficient Neural Video Compression [29.174318150967405]
Implicit Neural Representations (INRs) have demonstrated significant potential in video compression by representing videos as neural networks.
We present a novel understanding of INR models from an autoregressive (AR) perspective and introduce a Unified AutoRegressive Framework for memory-efficient Neural Video Compression (UAR-NVC)
UAR-NVC integrates timeline-based and INR-based neural video compression under a unified autoregressive paradigm.
arXiv Detail & Related papers (2025-03-04T15:54:57Z) - Adaptive Caching for Faster Video Generation with Diffusion Transformers [52.73348147077075]
Diffusion Transformers (DiTs) rely on larger models and heavier attention mechanisms, resulting in slower inference speeds.
We introduce a training-free method to accelerate video DiTs, termed Adaptive Caching (AdaCache)
We also introduce a Motion Regularization (MoReg) scheme to utilize video information within AdaCache, controlling the compute allocation based on motion content.
arXiv Detail & Related papers (2024-11-04T18:59:44Z) - Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design [18.57172631588624]
We propose a Dynamic Deep neural network assisted by a Content-Aware data processing pipeline to reduce the model number down to one.
Our method achieves better PSNR and real-time performance (33 FPS) on an off-the-shelf mobile phone.
arXiv Detail & Related papers (2024-07-03T05:17:26Z) - A Simple Recipe for Contrastively Pre-training Video-First Encoders
Beyond 16 Frames [54.90226700939778]
We build on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion.
We expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in standard video datasets, and (2) higher memory consumption, bottlenecking the number of frames that can be processed.
arXiv Detail & Related papers (2023-12-12T16:10:19Z) - A Codec Information Assisted Framework for Efficient Compressed Video
Super-Resolution [15.690562510147766]
Video Super-Resolution (VSR) using recurrent neural network architecture is a promising solution due to its efficient modeling of long-range temporal dependencies.
We propose a Codec Information Assisted Framework (CIAF) to boost and accelerate recurrent VSR models for compressed videos.
arXiv Detail & Related papers (2022-10-15T08:48:29Z) - Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression [25.96187914295921]
This paper proposes a powerful entropy model which efficiently captures both spatial and temporal dependencies.
Our entropy model can achieve 18.2% saving on UVG dataset when compared with H266 (VTM) using the highest compression ratio.
arXiv Detail & Related papers (2022-07-13T00:03:54Z) - Self-Conditioned Probabilistic Learning of Video Rescaling [70.10092286301997]
We propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously.
We decrease the entropy of the information lost in the downscaling by maximizing its conditioned probability on the strong spatial-temporal prior information.
We extend the framework to a lossy video compression system, in which a gradient estimator for non-differential industrial lossy codecs is proposed.
arXiv Detail & Related papers (2021-07-24T15:57:15Z) - Slimmable Compressive Autoencoders for Practical Neural Image
Compression [20.715312224456138]
We propose slimmable compressive autoencoders (SlimCAEs) for practical image compression.
SlimCAEs are highly flexible models that provide excellent rate-distortion performance, variable rate, and dynamic adjustment of memory, computational cost and latency.
arXiv Detail & Related papers (2021-03-29T16:12:04Z) - VA-RED$^2$: Video Adaptive Redundancy Reduction [64.75692128294175]
We present a redundancy reduction framework, VA-RED$2$, which is input-dependent.
We learn the adaptive policy jointly with the network weights in a differentiable way with a shared-weight mechanism.
Our framework achieves $20% - 40%$ reduction in computation (FLOPs) when compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-02-15T22:57:52Z) - Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs.
We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z) - Learning for Video Compression with Recurrent Auto-Encoder and Recurrent
Probability Model [164.7489982837475]
This paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model ( RPM)
The RAE employs recurrent cells in both the encoder and decoder to exploit the temporal correlation among video frames.
Our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM.
arXiv Detail & Related papers (2020-06-24T08:46:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.