Efficient Per-Shot Convex Hull Prediction By Recurrent Learning
- URL: http://arxiv.org/abs/2206.04877v1
- Date: Fri, 10 Jun 2022 05:11:02 GMT
- Title: Efficient Per-Shot Convex Hull Prediction By Recurrent Learning
- Authors: Somdyuti Paul, Andrey Norkin and Alan C. Bovik
- Abstract summary: We propose a deep learning based method of content aware convex hull prediction.
We employ a recurrent convolutional network (RCN) to implicitly analyze the complexity of video shots in order to predict their convex hulls.
Our experimental results reveal that our proposed model better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches.
- Score: 50.94452824380868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adaptive video streaming relies on the construction of efficient bitrate
ladders to deliver the best possible visual quality to viewers under bandwidth
constraints. The traditional method of content dependent bitrate ladder
selection requires a video shot to be pre-encoded with multiple encoding
parameters to find the optimal operating points given by the convex hull of the
resulting rate-quality curves. However, this pre-encoding step is equivalent to
an exhaustive search process over the space of possible encoding parameters,
which causes significant overhead in terms of both computation and time
expenditure. To reduce this overhead, we propose a deep learning based method
of content aware convex hull prediction. We employ a recurrent convolutional
network (RCN) to implicitly analyze the spatiotemporal complexity of video
shots in order to predict their convex hulls. A two-step transfer learning
scheme is adopted to train our proposed RCN-Hull model, which ensures
sufficient content diversity to analyze scene complexity, while also making it
possible capture the scene statistics of pristine source videos. Our
experimental results reveal that our proposed model yields better
approximations of the optimal convex hulls, and offers competitive time savings
as compared to existing approaches. On average, the pre-encoding time was
reduced by 58.0% by our method, while the average Bjontegaard delta bitrate
(BD-rate) of the predicted convex hulls against ground truth was 0.08%, while
the mean absolute deviation of the BD-rate distribution was 0.44%
Related papers
- Learning Temporally Consistent Video Depth from Video Diffusion Priors [57.929828486615605]
This work addresses the challenge of video depth estimation.
We reformulate the prediction task into a conditional generation problem.
This allows us to leverage the prior knowledge embedded in existing video generation models.
arXiv Detail & Related papers (2024-06-03T16:20:24Z) - Optimal Transcoding Resolution Prediction for Efficient Per-Title
Bitrate Ladder Estimation [9.332104035349932]
We demonstrate that content-optimized features and ladders can be efficiently determined without any pre-encoding.
Our method well approximates the ground-truth-resolution pairs with a slight Bjontegaard Delta rate loss of 1.21%.
arXiv Detail & Related papers (2024-01-09T08:01:47Z) - ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image
Compression [18.05997169440533]
We propose ConvNeXt-ChARM, an efficient ConvNeXt-based transform coding framework, paired with a compute-efficient channel-wise auto-regressive auto-regressive.
We show that ConvNeXt-ChARM brings consistent and significant BD-rate (PSNR) reductions estimated on average to 5.24% and 1.22% over the versatile video coding (VVC) reference encoder (VTM-18.0) and the state-of-the-art learned image compression method SwinT-ChARM.
arXiv Detail & Related papers (2023-07-12T11:45:54Z) - Differentiable bit-rate estimation for neural-based video codec
enhancement [2.592974861902384]
Neural networks (NN) can improve standard video compression by pre- and post-processing the encoded video.
For optimal NN training, the standard proxy needs to be replaced with a proxy that can provide derivatives of estimated bit-rate and distortion.
This paper presents a new approach for bit-rate estimation that is similar to the type employed in training end-to-end neural codecs.
arXiv Detail & Related papers (2023-01-24T01:36:07Z) - Learning Quantization in LDPC Decoders [14.37550972719183]
We propose a floating-point surrogate model that imitates quantization effects as additions of uniform noise.
A deep learning-based method is then applied to optimize the message bitwidths.
We report an error-rate performance within 0.2 dB of floating-point decoding at an average message quantization bitwidth of 3.1 bits.
arXiv Detail & Related papers (2022-08-10T07:07:54Z) - Coarse-to-fine Deep Video Coding with Hyperprior-guided Mode Prediction [50.361427832256524]
We propose a coarse-to-fine (C2F) deep video compression framework for better motion compensation.
Our C2F framework can achieve better motion compensation results without significantly increasing bit costs.
arXiv Detail & Related papers (2022-06-15T11:38:53Z) - Efficient VVC Intra Prediction Based on Deep Feature Fusion and
Probability Estimation [57.66773945887832]
We propose to optimize Versatile Video Coding (VVC) complexity at intra-frame prediction, with a two-stage framework of deep feature fusion and probability estimation.
Experimental results on standard database demonstrate the superiority of proposed method, especially for High Definition (HD) and Ultra-HD (UHD) video sequences.
arXiv Detail & Related papers (2022-05-07T08:01:32Z) - AuxAdapt: Stable and Efficient Test-Time Adaptation for Temporally
Consistent Video Semantic Segmentation [81.87943324048756]
In video segmentation, generating temporally consistent results across frames is as important as achieving frame-wise accuracy.
Existing methods rely on optical flow regularization or fine-tuning with test data to attain temporal consistency.
This paper presents an efficient, intuitive, and unsupervised online adaptation method, AuxAdapt, for improving the temporal consistency of most neural network models.
arXiv Detail & Related papers (2021-10-24T07:07:41Z) - Learning from Images: Proactive Caching with Parallel Convolutional
Neural Networks [94.85780721466816]
A novel framework for proactive caching is proposed in this paper.
It combines model-based optimization with data-driven techniques by transforming an optimization problem into a grayscale image.
Numerical results show that the proposed scheme can reduce 71.6% computation time with only 0.8% additional performance cost.
arXiv Detail & Related papers (2021-08-15T21:32:47Z) - End-to-end Neural Video Coding Using a Compound Spatiotemporal
Representation [33.54844063875569]
We propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by two approaches.
Specifically, we generate a compoundtemporal representation (STR) through a recurrent information aggregation (RIA) module.
We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements.
arXiv Detail & Related papers (2021-08-05T19:43:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.