ELF-VC: Efficient Learned Flexible-Rate Video Coding
- URL: http://arxiv.org/abs/2104.14335v1
- Date: Thu, 29 Apr 2021 17:50:35 GMT
- Title: ELF-VC: Efficient Learned Flexible-Rate Video Coding
- Authors: Oren Rippel, Alexander G. Anderson, Kedar Tatwawadi, Sanjay Nair,
Craig Lytle, Lubomir Bourdev
- Abstract summary: We propose several novel ideas for learned video compression which allow for improved performance for the low-latency mode.
We benchmark our method, which we call ELF-VC, on popular video test sets UVG and MCL-JCV.
Our approach runs at least 5x faster and has fewer parameters than all ML codecs which report these figures.
- Score: 61.10102916737163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While learned video codecs have demonstrated great promise, they have yet to
achieve sufficient efficiency for practical deployment. In this work, we
propose several novel ideas for learned video compression which allow for
improved performance for the low-latency mode (I- and P-frames only) along with
a considerable increase in computational efficiency. In this setting, for
natural videos our approach compares favorably across the entire R-D curve
under metrics PSNR, MS-SSIM and VMAF against all mainstream video standards
(H.264, H.265, AV1) and all ML codecs. At the same time, our approach runs at
least 5x faster and has fewer parameters than all ML codecs which report these
figures.
Our contributions include a flexible-rate framework allowing a single model
to cover a large and dense range of bitrates, at a negligible increase in
computation and parameter count; an efficient backbone optimized for ML-based
codecs; and a novel in-loop flow prediction scheme which leverages prior
information towards more efficient compression.
We benchmark our method, which we call ELF-VC (Efficient, Learned and
Flexible Video Coding) on popular video test sets UVG and MCL-JCV under metrics
PSNR, MS-SSIM and VMAF. For example, on UVG under PSNR, it reduces the BD-rate
by 44% against H.264, 26% against H.265, 15% against AV1, and 35% against the
current best ML codec. At the same time, on an NVIDIA Titan V GPU our approach
encodes/decodes VGA at 49/91 FPS, HD 720 at 19/35 FPS, and HD 1080 at 10/18
FPS.
Related papers
- Extending Video Masked Autoencoders to 128 frames [75.01251612160829]
Video understanding has witnessed significant progress with recent video foundation models demonstrating strong performance owing to self-supervised pre-training objectives; Masked Autoencoders (MAE) being the design of choice.
However, the majority of prior works that leverage MAE pre-training have focused on relatively short video representations (16 / 32 frames in length) largely due to hardware memory and compute limitations that scale poorly with video length due to the dense memory-intensive self-attention decoding.
We propose an effective strategy for prioritizing tokens which allows training on longer video sequences (128 frames) and gets better performance than, more typical, random
arXiv Detail & Related papers (2024-11-20T20:00:38Z) - Accelerating Learned Video Compression via Low-Resolution Representation Learning [18.399027308582596]
We introduce an efficiency-optimized framework for learned video compression that focuses on low-resolution representation learning.
Our method achieves performance levels on par with the low-decay P configuration of the H.266 reference software VTM.
arXiv Detail & Related papers (2024-07-23T12:02:57Z) - Optimal Video Compression using Pixel Shift Tracking [0.0]
This paper introduces the approach of redundancies removal in subsequent frames for a given video as a main approach for video compression.
We call this method Redundancy Removal using Shift (Rtextsuperscript2S)
In this study, we have utilized a computer vision-based pixel point tracking method to identify redundant pixels to encode video for optimal storage.
arXiv Detail & Related papers (2024-06-28T03:36:38Z) - An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models [65.37846460916042]
We find out that the attention computation over visual tokens is of extreme inefficiency in the deep layers of popular LVLMs.
We introduce FastV, a versatile plug-and-play method designed to optimize computational efficiency.
arXiv Detail & Related papers (2024-03-11T14:35:32Z) - Rate-Perception Optimized Preprocessing for Video Coding [15.808458228130261]
We propose a rate-perception optimized preprocessing (RPP) method to improve the rate-distortion performance.
Our RPP method is very simple and efficient which is not required any changes in the setting of video encoding, streaming, and decoding.
In our subjective visual quality test, 87% of users think videos with RPP are better or equal to videos by only using the to compress these videos with RPP save about 12%.
arXiv Detail & Related papers (2023-01-25T08:21:52Z) - EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens [57.354304637367555]
We present EVEREST, a surprisingly efficient MVA approach for video representation learning.
It finds tokens containing rich motion features and discards uninformative ones during both pre-training and fine-tuning.
Our method significantly reduces the computation and memory requirements of MVA.
arXiv Detail & Related papers (2022-11-19T09:57:01Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - AlphaVC: High-Performance and Efficient Learned Video Compression [4.807439168741098]
We introduce conditional-I-frame as the first frame in the GoP, which stabilizes the reconstructed quality and saves the bit-rate.
Second, to efficiently improve the accuracy of inter prediction without increasing the complexity of decoder, we propose a pixel-to-feature motion prediction method at encoder side.
Third, we propose a probability-based entropy skipping method, which not only brings performance gain, but also greatly reduces the runtime of entropy coding.
arXiv Detail & Related papers (2022-07-29T13:52:44Z) - Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs.
We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.