EVC: Towards Real-Time Neural Image Compression with Mask Decay
- URL: http://arxiv.org/abs/2302.05071v1
- Date: Fri, 10 Feb 2023 06:02:29 GMT
- Title: EVC: Towards Real-Time Neural Image Compression with Mask Decay
- Authors: Guo-Hua Wang, Jiahao Li, Bin Li, Yan Lu
- Abstract summary: Neural image compression has surpassed state-of-the-art traditional codecs (H.266/VVC) for rate-distortion (RD) performance.
We propose an Efficient single-model Variable-bit-rate Codec (EVC) which is able to run at 30 FPS with 768x512 input images and still outperforms VVC for the RD performance.
- Score: 29.76392801329279
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural image compression has surpassed state-of-the-art traditional codecs
(H.266/VVC) for rate-distortion (RD) performance, but suffers from large
complexity and separate models for different rate-distortion trade-offs. In
this paper, we propose an Efficient single-model Variable-bit-rate Codec (EVC),
which is able to run at 30 FPS with 768x512 input images and still outperforms
VVC for the RD performance. By further reducing both encoder and decoder
complexities, our small model even achieves 30 FPS with 1920x1080 input images.
To bridge the performance gap between our different capacities models, we
meticulously design the mask decay, which transforms the large model's
parameters into the small model automatically. And a novel sparsity
regularization loss is proposed to mitigate shortcomings of $L_p$
regularization. Our algorithm significantly narrows the performance gap by 50%
and 30% for our medium and small models, respectively. At last, we advocate the
scalable encoder for neural image compression. The encoding complexity is
dynamic to meet different latency requirements. We propose decaying the large
encoder multiple times to reduce the residual representation progressively.
Both mask decay and residual representation learning greatly improve the RD
performance of our scalable encoder. Our code is at
https://github.com/microsoft/DCVC.
Related papers
- ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models [77.59651787115546]
High-resolution Large Multimodal Models (LMMs) encounter the challenges of excessive visual tokens and quadratic visual complexity.
We propose ConvLLaVA, which employs ConvNeXt, a hierarchical backbone, as the visual encoder of LMM.
ConvLLaVA compresses high-resolution images into information-rich visual features, effectively preventing the generation of excessive visual tokens.
arXiv Detail & Related papers (2024-05-24T17:34:15Z) - Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features.
We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps.
We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z) - C3: High-performance and low-complexity neural compression from a single
image or video [16.770509909942312]
We introduce C3, a neural compression method with strong rate-distortion (RD) performance.
The resulting decoding complexity of C3 can be an order of magnitude lower than neural baselines with similar RD performance.
arXiv Detail & Related papers (2023-12-05T13:28:59Z) - Computationally-Efficient Neural Image Compression with Shallow Decoders [43.115831685920114]
This paper takes a step forward towards closing the gap in decoding complexity by using a shallow or even linear decoding transform resembling that of JPEG.
We exploit the often asymmetrical budget between encoding and decoding, by adopting more powerful encoder networks and iterative encoding.
arXiv Detail & Related papers (2023-04-13T03:38:56Z) - Video Coding Using Learned Latent GAN Compression [1.6058099298620423]
We leverage the generative capacity of GANs such as StyleGAN to represent and compress a video.
Each frame is inverted in the latent space of StyleGAN, from which the optimal compression is learned.
arXiv Detail & Related papers (2022-07-09T19:07:43Z) - Asymmetric Learned Image Compression with Multi-Scale Residual Block,
Importance Map, and Post-Quantization Filtering [15.056672221375104]
Deep learning-based image compression has achieved better ratedistortion (R-D) performance than the latest traditional method, H.266/VVC.
Many leading learned schemes cannot maintain a good trade-off between performance and complexity.
We propose an effcient and effective image coding framework, which achieves similar R-D performance with lower complexity than the state of the art.
arXiv Detail & Related papers (2022-06-21T09:34:29Z) - PILC: Practical Image Lossless Compression with an End-to-end GPU
Oriented Neural Framework [88.18310777246735]
We propose an end-to-end image compression framework that achieves 200 MB/s for both compression and decompression with a single NVIDIA Tesla V100 GPU.
Experiments show that our framework compresses better than PNG by a margin of 30% in multiple datasets.
arXiv Detail & Related papers (2022-06-10T03:00:10Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z) - Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs.
We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z) - Learning for Video Compression with Recurrent Auto-Encoder and Recurrent
Probability Model [164.7489982837475]
This paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model ( RPM)
The RAE employs recurrent cells in both the encoder and decoder to exploit the temporal correlation among video frames.
Our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM.
arXiv Detail & Related papers (2020-06-24T08:46:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.