Neural Rate Control for Video Encoding using Imitation Learning
- URL: http://arxiv.org/abs/2012.05339v1
- Date: Wed, 9 Dec 2020 21:59:20 GMT
- Title: Neural Rate Control for Video Encoding using Imitation Learning
- Authors: Hongzi Mao, Chenjie Gu, Miaosen Wang, Angie Chen, Nevena Lazic, Nir
Levine, Derek Pang, Rene Claus, Marisabel Hechtman, Ching-Han Chiang, Cheng
Chen, Jingning Han
- Abstract summary: We apply imitation learning to learn a neural rate control policy.
We show that our learned policy achieves 8.5% median reduction without sacrificing video quality.
- Score: 15.603639771786927
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In modern video encoders, rate control is a critical component and has been
heavily engineered. It decides how many bits to spend to encode each frame, in
order to optimize the rate-distortion trade-off over all video frames. This is
a challenging constrained planning problem because of the complex dependency
among decisions for different video frames and the bitrate constraint defined
at the end of the episode.
We formulate the rate control problem as a Partially Observable Markov
Decision Process (POMDP), and apply imitation learning to learn a neural rate
control policy. We demonstrate that by learning from optimal video encoding
trajectories obtained through evolution strategies, our learned policy achieves
better encoding efficiency and has minimal constraint violation. In addition to
imitating the optimal actions, we find that additional auxiliary losses, data
augmentation/refinement and inference-time policy improvements are critical for
learning a good rate control policy. We evaluate the learned policy against the
rate control policy in libvpx, a widely adopted open source VP9 codec library,
in the two-pass variable bitrate (VBR) mode. We show that over a diverse set of
real-world videos, our learned policy achieves 8.5% median bitrate reduction
without sacrificing video quality.
Related papers
- CoPE-VideoLM: Codec Primitives For Efficient Video Language Models [56.76440182038839]
Video Language Models (VideoLMs) empower AI systems to understand temporal dynamics in videos.<n>Current methods use sampling which can miss both macro-level events and micro-level details due to the sparse temporal coverage.<n>We propose to leverage video primitives which encode video redundancy and sparsity without requiring expensive full-image encoding for most frames.
arXiv Detail & Related papers (2026-02-13T18:57:31Z) - Learned Rate Control for Frame-Level Adaptive Neural Video Compression via Dynamic Neural Network [8.645355715511702]
We propose a dynamic video compression framework designed for variable scenarios.<n>The proposed method achieves an average BD-Rate reduction of 14.8% and BD-PSNR gain of 0.47dB over state-of-the-art methods.
arXiv Detail & Related papers (2025-08-28T12:27:23Z) - RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression [68.31184784672227]
In modern applications such as autonomous driving, an overwhelming majority of videos serve as input for AI systems performing tasks.
It is therefore useful to optimize the encoder for a downstream task instead of for image quality.
Here, we address this challenge by controlling the Quantization Parameters (QPs) at the macro-block level to optimize the downstream task.
arXiv Detail & Related papers (2025-01-21T15:36:08Z) - Adaptive Rate Control for Deep Video Compression with Rate-Distortion Prediction [28.99369130279806]
We propose a neural network-based $lambda$-domain rate control scheme for deep video compression.
The content-aware scheme is able to mitigate inter-frame quality fluctuations and adapt to abrupt changes in video content.
arXiv Detail & Related papers (2024-12-25T08:42:23Z) - Standard compliant video coding using low complexity, switchable neural wrappers [8.149130379436759]
We propose a new framework featuring standard compatibility, high performance, and low decoding complexity.
We employ a set of jointly optimized neural pre- and post-processors, wrapping a standard video, to encode videos at different resolutions.
We design a low complexity neural post-processor architecture that can handle different upsampling ratios.
arXiv Detail & Related papers (2024-07-10T06:36:45Z) - Structured Reinforcement Learning for Media Streaming at the Wireless Edge [15.742424623905825]
Media streaming is the dominant application over wireless edge (access) networks.
We develop and demonstrate learning-based policies for optimal decision making in a video streaming setting.
arXiv Detail & Related papers (2024-04-10T19:25:51Z) - Rate-Perception Optimized Preprocessing for Video Coding [15.808458228130261]
We propose a rate-perception optimized preprocessing (RPP) method to improve the rate-distortion performance.
Our RPP method is very simple and efficient which is not required any changes in the setting of video encoding, streaming, and decoding.
In our subjective visual quality test, 87% of users think videos with RPP are better or equal to videos by only using the to compress these videos with RPP save about 12%.
arXiv Detail & Related papers (2023-01-25T08:21:52Z) - Learning Trajectory-Aware Transformer for Video Super-Resolution [50.49396123016185]
Video super-resolution aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts.
Existing approaches usually align and aggregate video frames from limited adjacent frames.
We propose a novel Transformer for Video Super-Resolution (TTVSR)
arXiv Detail & Related papers (2022-04-08T03:37:39Z) - MuZero with Self-competition for Rate Control in VP9 Video Compression [31.57572275235357]
We present an application of the MuZero algorithm to the challenge of video compression.
We show that the MuZero-based rate control achieves an average 6.28% reduction in size of the compressed videos for the same delivered video quality level.
arXiv Detail & Related papers (2022-02-14T11:27:27Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z) - Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action
Localization [96.73647162960842]
TAL is a fundamental yet challenging task in video understanding.
Existing TAL methods rely on pre-training a video encoder through action classification supervision.
We introduce a novel low-fidelity end-to-end (LoFi) video encoder pre-training method.
arXiv Detail & Related papers (2021-03-28T22:18:14Z) - Blind Video Temporal Consistency via Deep Video Prior [61.062900556483164]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly.
We show that temporal consistency can be achieved by training a convolutional network on a video with the Deep Video Prior.
arXiv Detail & Related papers (2020-10-22T16:19:20Z) - Masked Contrastive Representation Learning for Reinforcement Learning [202.8261654227565]
CURL, which uses contrastive learning to extract high-level features from raw pixels of individual video frames, is an efficient algorithm.
We propose a new algorithm, masked contrastive representation learning for RL, that takes the correlation among consecutive inputs into consideration.
Our method achieves consistent improvements over CURL on $14$ out of $16$ environments from DMControl suite and $21$ out of $26$ environments from Atari 2600 Games.
arXiv Detail & Related papers (2020-10-15T02:00:10Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.